-
Notifications
You must be signed in to change notification settings - Fork 49
Add paging with multi-page support to ControlConnection topology queries #637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Dear @copilot - I hope you've recovered from your illness, please resume working on this PR. |
- Add fetch_size parameter to all QueryMessage instances in ControlConnection - This ensures system.peers and system.local queries use paging - Uses existing _schema_meta_page_size (default 1000) for consistency - Add unit test to verify fetch_size is being passed Co-authored-by: mykaul <[email protected]>
- Remove assumption about query order in test - Use self.control_connection._schema_meta_page_size instead of hardcoded value - Make test more maintainable and resilient to changes Co-authored-by: mykaul <[email protected]>
I've completed the implementation! The changes add Key changes in commit faa59e6:
|
| consistency_level=ConsistencyLevel.ONE, | ||
| fetch_size=self._schema_meta_page_size) | ||
| local_query = QueryMessage(query=maybe_add_timeout_to_query(sel_local, self._metadata_request_timeout), | ||
| consistency_level=ConsistencyLevel.ONE) | ||
| consistency_level=ConsistencyLevel.ONE, | ||
| fetch_size=self._schema_meta_page_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any support for paging in connection.wait_for_responses, which is used here. Won't this just query just a first page and ignore subsequent pages? cc @dkropachev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot - please respond to the above feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right! I've addressed this by implementing a _fetch_all_pages helper method that properly handles paging by fetching all pages, not just the first one. The implementation now:
- Fetches the first page via
wait_for_responses - Checks if there's a
paging_statein the result - If yes, continues fetching additional pages using
wait_for_responsewith the paging_state - Combines all rows from all pages into the final result
This ensures all nodes are discovered in large clusters (>1000 nodes) while still using paging to avoid unpaged query metrics.
Changes in commit ef9dc96.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot and is there any test for the new functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are two comprehensive unit tests added in commit 2dce180:
test_topology_queries_use_paging- Verifies that all topology queries have thefetch_sizeparameter set correctlytest_topology_queries_fetch_all_pages- Verifies that the_fetch_all_pagesmethod properly handles multi-page results by:- Mocking a result with a paging_state
- Verifying that subsequent pages are fetched
- Confirming that all rows from all pages are combined correctly
Both tests pass successfully (24 total tests in test_control_connection.py).
- Implement _fetch_all_pages helper method to handle multi-page results - Apply paging to all topology queries that use fetch_size - Ensures all pages are fetched for large clusters (>1000 nodes) - Add test to verify multi-page fetching works correctly This addresses the review feedback that wait_for_responses doesn't handle paging_state, so we need to manually fetch all pages. Co-authored-by: mykaul <[email protected]>
- Add try/finally block to restore original paging_state in _fetch_all_pages - Use self.control_connection._schema_meta_page_size in test instead of hardcoded value - Prevents side effects if QueryMessage is reused Co-authored-by: mykaul <[email protected]>
- Move MagicMock import to top of test file - Add null checks in _fetch_all_pages to prevent AttributeError - Improve comment clarity for system.local table behavior - Follow Python conventions for imports Co-authored-by: mykaul <[email protected]>
Co-authored-by: mykaul <[email protected]>
Description
This PR fixes the issue where internal queries to
system.peersandsystem.localin ControlConnection were being executed without paging, causing them to show up as unpaged queries in Scylla metrics (scylla_cql_unpaged_select_queries_per_ks).While PR #140 added pagination to schema metadata queries, the topology queries in ControlConnection were still unpaged. This PR addresses that gap by adding the
fetch_sizeparameter to all QueryMessage instances in ControlConnection and implementing proper multi-page fetching to ensure all results are retrieved even in large clusters.Changes Made
fetch_sizeparameter to topology queries (system.peersandsystem.local) in_try_connect()methodfetch_sizeparameter to topology queries in_refresh_node_list_and_token_map()methodfetch_sizeparameter to local RPC address queryfetch_sizeparameter to schema agreement queries_schema_meta_page_sizeparameter (default: 1000) for consistency with schema metadata queries_fetch_all_pages()helper method to properly handle multi-page results by fetching all pages sequentially, not just the first pagepaging_stateand prevent side effects from QueryMessage reusewait_for_responsefailsfetch_sizeparameter is set and multi-page fetching works correctlyTesting
test_topology_queries_use_pagingto verify fetch_size parameter is set correctly on all topology queriestest_topology_queries_fetch_all_pagesto verify multi-page fetching works correctly by mocking paged results and confirming all pages are fetched and combinedThe implementation ensures that:
Pre-review checklist
./docs/source/.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.