Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request updates the version to 0.6.25 and makes several changes to improve consistency and functionality across the codebase.
- Updates version from 0.6.24 to 0.6.25
- Replaces hash functions with xxhash for better performance and consistency
- Refactors test assertions to use set comparisons for order-independent testing
Reviewed Changes
Copilot reviewed 12 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| mabel/version.py | Updates version number to 0.6.25 |
| mabel/data/internals/group_by.py | Replaces siphash with xxhash for group key generation |
| mabel/data/readers/internals/cursor.py | Replaces CityHash64 with xxhash for partition identification |
| tests/test_data_group_by.py | Updates test assertions to use set comparisons and fixes import paths |
| tests/test_data_dictset.py | Updates expected hash value and fixes import path |
| tests/test_reader_cursor.py | Updates partition values, removes unused code, and fixes import path |
| mabel/data/writers/internals/blob_writer.py | Removes zstd compression parameter from parquet writer |
| mabel/utils/dates.py | Adds trailing comma for function parameter |
| mabel/data/validator/init.py | Adds trailing comma for function parameter |
| tests/test_utils_common.py | Simplifies docstring format |
| tests/performance/indexing.py | Simplifies docstring format |
| mabel/data/readers/internals/threaded_wrapper.py | Simplifies docstring format |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| group_key: int = xxh3_64_intdigest( | ||
| "".join([f"{record.get(column, '')}" for column in self._columns]), | ||
| HASH_SEED, |
There was a problem hiding this comment.
The xxh3_64_intdigest function call has an extra comma after HASH_SEED. This should be removed to match the function signature and the usage pattern on lines 75-77.
| group_key: int = xxh3_64_intdigest( | |
| "".join([f"{record.get(column, '')}" for column in self._columns]), | |
| HASH_SEED, | |
| HASH_SEED |
| expected = [{'COUNT(*)': 6, 'user': 'alice'}, {'COUNT(*)': 5, 'user': 'bob'}, {'COUNT(*)': 2, 'user': 'eve'}] | ||
| assert set(tuple(sorted(d.items())) for d in ls) == set(tuple(sorted(d.items())) for d in expected) |
There was a problem hiding this comment.
[nitpick] The set comparison pattern using tuple(sorted(d.items())) is repeated throughout this file. Consider extracting this into a helper function like assert_dict_sets_equal(actual, expected) to reduce duplication and improve readability.
No description provided.