refactor: reduce provider sync/async duplication#153
Conversation
|
Hi @johnnichev, I've opened a PR for this issue. Changes included: Extracted shared helper logic in the Anthropic provider to reduce sync/async duplication. Please let me know if you'd like any additional cleanup or if there are other duplication points you'd prefer to be extracted before merge. |
johnnichev
left a comment
There was a problem hiding this comment.
Thanks for tackling #143. The direction is right and most of the extraction is faithful. A few things need to change before this can merge.
Blocking: behavior regression in gemini_provider.py stream() and astream()
The old tool-call branch set _yielded_any = True right before yielding. A helper cannot set the caller's local, so _toolcall_from_part drops that side effect. For a tool-call-only streamed response (no text, which is the normal function-calling case) _yielded_any now stays False and this guard fires at the end of the loop:
if tools and not _yielded_any:
_warn_empty_tool_response(model_name, _last_finish_reason)That is the Issue #66 safeguard. After this change it logs the "returned neither text nor a tool call" warning on every healthy tool stream, and it can no longer distinguish a genuinely empty response from a normal one. The suite passes because the ToolCall is still yielded correctly and nothing asserts on the warning.
Fix: set _yielded_any = True immediately before yield self._toolcall_from_part(part) in both stream() and astream().
Blocking: CI lint and format
ruff format --check fails on all 5 files (extra blank lines around the new helpers). ruff check also fails:
W291trailing whitespace:anthropic_provider.py:212andgemini_provider.py:171F401unusedfrom google.genai import types: thetypesusage moved into_build_config, but the import was left behind incomplete(),stream(),acomplete(), andastream()
Please run ruff format and ruff check --fix and confirm both are clean. The PR validation ran compileall and pytest but not ruff, which is why these slipped through.
Quality: incomplete refactor in anthropic_provider.py
_build_request_args already sets tools and timeout, but each call site still runs the old if tools: request_args["tools"] = self._tools_param(tools) (and the timeout block) afterward. So _tools_param runs twice and the duplication the helper was meant to remove is still present. Drop the redundant call-site blocks.
Types
Please annotate the new helpers. _build_config has no return type and an untyped tools, and _toolcall_from_part(self, part) has an untyped part. The library targets a fully typed surface.
Minor: embeddings ABC
Pulling dimension into EmbeddingProvider and returning self._dimension matches the issue and is fine. Just note it weakens the contract: a future provider that forgets to set self._dimension now hits a runtime AttributeError instead of failing at instantiation. A default in the ABC or a docstring note would cover that.
Verified and correct, no action needed
- The
call_namesubstitution is identical tostr(part.function_call.name or ""). - Anthropic
timeoutis preserved (the helper applies it, and the originalcomplete()/acomplete()did too). _build_configproduces the same config object on the stream and non-stream paths.
Once the _yielded_any fix and the ruff failures are in, this is good to go.
|
Thanks for jumping on this @vaishnavidesai09, nice first pass. You clearly got what #143 was after and the helper extraction reads well. I dropped a detailed review up top. Really just one thing that matters: the Gemini stream paths lost the Ping me when you've pushed and I'll take another look. |
|
Hi @johnnichev, I've addressed the requested changes:
Current results:
Please let me know if you'd like any additional cleanup before merge. Thanks for the review. |
CI runs neither ruff nor mypy, so three issues slipped through on this branch: - ruff format: removing the duplicated `dimension` properties left stray consecutive blank lines before `__stability__` in 4 embedding modules. Reformatted (whitespace only). - mypy [attr-defined]: the new shared `EmbeddingProvider.dimension` returns `self._dimension`, but the base class never declared it. Added a `_dimension: int` class annotation that documents the contract subclasses must satisfy. - mypy [union-attr]: `_toolcall_from_part` reads `part.function_call.name` outside the callers' `if part.function_call:` guards. Narrowed with a local + explicit guard. No behavior change. ruff format/check + mypy all clean on the 7 files. Claude-Session: https://claude.ai/code/session_011xRdXi1rCM9jbpHQJkNzof
Empty commit to re-fire the pull_request workflow (squashed on merge). Claude-Session: https://claude.ai/code/session_011xRdXi1rCM9jbpHQJkNzof
johnnichev
left a comment
There was a problem hiding this comment.
Re-reviewed after the updates. Every blocking item is resolved and verified:
- Gemini
_yielded_any(#66 safeguard): now set immediately beforeyield self._toolcall_from_part(part)in bothstream()andastream(). The empty-tool-response warning no longer fires on healthy tool streams. - Anthropic redundant blocks: gone —
_build_request_argsis the single source oftools/timeout, no double_tools_param. - Types:
_build_configand_toolcall_from_partare annotated. - #143 acceptance: the named helpers (
_build_config/_toolcall_from_part,_build_request_args/_usage_from_response/_message_from_content_blocks) and the embeddingsdimension→ABC move are all in, sync/async paths share them.
CI runs neither ruff nor mypy, so I pushed one small cleanup commit on top to get the branch fully green against the repo's local gates: ruff format on the embedding modules (stray blank lines after the dimension removal), a _dimension: int declaration on EmbeddingProvider so the shared property type-checks (also documents the subclass contract), and a narrowing guard in _toolcall_from_part for mypy union-attr. All whitespace/type-only, no behavior change. ruff format/check + mypy + full pre-commit all pass; CI matrix green.
Nice work on your first contribution — merging this into the next release. Thanks for sticking with the review.
Summary
Refactors embedding providers and provider implementations to reduce duplicated code.
Changes
Embedding providers
Centralized the
dimensionproperty implementation inEmbeddingProvider.Removed duplicate
dimensionproperty implementations from:OpenAIEmbeddingProviderAnthropicEmbeddingProviderCohereEmbeddingProviderGeminiEmbeddingProviderAll providers continue to initialize
_dimensionvia_get_model_dimension().Provider refactors
Added shared request-building helpers in:
anthropic_provider.pygemini_provider.pyConsolidated request argument construction used by sync and async execution paths.
Reduced duplicated logic between sync and async provider implementations.
Why
The previous implementation duplicated identical logic across multiple providers and between sync/async code paths. Centralizing this behavior improves maintainability and reduces the risk of future inconsistencies.
Validation
python3 -m compileall src/selectoolsNotes
This is a refactor-only change. No user-facing behavior is intended to change.
Related to #143