Skip to content

refactor: reduce provider sync/async duplication#153

Merged
johnnichev merged 4 commits into
johnnichev:mainfrom
vaishnavidesai09:fix-143-provider-sync-async-duplication
Jun 20, 2026
Merged

refactor: reduce provider sync/async duplication#153
johnnichev merged 4 commits into
johnnichev:mainfrom
vaishnavidesai09:fix-143-provider-sync-async-duplication

Conversation

@vaishnavidesai09

Copy link
Copy Markdown
Contributor

Summary

Refactors embedding providers and provider implementations to reduce duplicated code.

Changes

Embedding providers

  • Centralized the dimension property implementation in EmbeddingProvider.

  • Removed duplicate dimension property implementations from:

    • OpenAIEmbeddingProvider
    • AnthropicEmbeddingProvider
    • CohereEmbeddingProvider
    • GeminiEmbeddingProvider
  • All providers continue to initialize _dimension via _get_model_dimension().

Provider refactors

  • Added shared request-building helpers in:

    • anthropic_provider.py
    • gemini_provider.py
  • Consolidated request argument construction used by sync and async execution paths.

  • Reduced duplicated logic between sync and async provider implementations.

Why

The previous implementation duplicated identical logic across multiple providers and between sync/async code paths. Centralizing this behavior improves maintainability and reduces the risk of future inconsistencies.

Validation

  • python3 -m compileall src/selectools
  • Full test suite executed successfully:
7459 passed, 215 skipped

Notes

This is a refactor-only change. No user-facing behavior is intended to change.

Related to #143

@vaishnavidesai09

Copy link
Copy Markdown
Contributor Author

Hi @johnnichev,

I've opened a PR for this issue.

Changes included:

Extracted shared helper logic in the Anthropic provider to reduce sync/async duplication.
Refactored duplicated Gemini provider logic into shared private helpers.
Moved the common dimension property implementation into EmbeddingProvider and removed the duplicated implementations from individual embedding providers.
Preserved existing behavior and verified the test suite passes (7459 passed).

Please let me know if you'd like any additional cleanup or if there are other duplication points you'd prefer to be extracted before merge.

@johnnichev johnnichev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling #143. The direction is right and most of the extraction is faithful. A few things need to change before this can merge.

Blocking: behavior regression in gemini_provider.py stream() and astream()

The old tool-call branch set _yielded_any = True right before yielding. A helper cannot set the caller's local, so _toolcall_from_part drops that side effect. For a tool-call-only streamed response (no text, which is the normal function-calling case) _yielded_any now stays False and this guard fires at the end of the loop:

if tools and not _yielded_any:
    _warn_empty_tool_response(model_name, _last_finish_reason)

That is the Issue #66 safeguard. After this change it logs the "returned neither text nor a tool call" warning on every healthy tool stream, and it can no longer distinguish a genuinely empty response from a normal one. The suite passes because the ToolCall is still yielded correctly and nothing asserts on the warning.

Fix: set _yielded_any = True immediately before yield self._toolcall_from_part(part) in both stream() and astream().

Blocking: CI lint and format

ruff format --check fails on all 5 files (extra blank lines around the new helpers). ruff check also fails:

  • W291 trailing whitespace: anthropic_provider.py:212 and gemini_provider.py:171
  • F401 unused from google.genai import types: the types usage moved into _build_config, but the import was left behind in complete(), stream(), acomplete(), and astream()

Please run ruff format and ruff check --fix and confirm both are clean. The PR validation ran compileall and pytest but not ruff, which is why these slipped through.

Quality: incomplete refactor in anthropic_provider.py

_build_request_args already sets tools and timeout, but each call site still runs the old if tools: request_args["tools"] = self._tools_param(tools) (and the timeout block) afterward. So _tools_param runs twice and the duplication the helper was meant to remove is still present. Drop the redundant call-site blocks.

Types

Please annotate the new helpers. _build_config has no return type and an untyped tools, and _toolcall_from_part(self, part) has an untyped part. The library targets a fully typed surface.

Minor: embeddings ABC

Pulling dimension into EmbeddingProvider and returning self._dimension matches the issue and is fine. Just note it weakens the contract: a future provider that forgets to set self._dimension now hits a runtime AttributeError instead of failing at instantiation. A default in the ABC or a docstring note would cover that.

Verified and correct, no action needed

  • The call_name substitution is identical to str(part.function_call.name or "").
  • Anthropic timeout is preserved (the helper applies it, and the original complete()/acomplete() did too).
  • _build_config produces the same config object on the stream and non-stream paths.

Once the _yielded_any fix and the ruff failures are in, this is good to go.

@johnnichev

Copy link
Copy Markdown
Owner

Thanks for jumping on this @vaishnavidesai09, nice first pass. You clearly got what #143 was after and the helper extraction reads well.

I dropped a detailed review up top. Really just one thing that matters: the Gemini stream paths lost the _yielded_any = True before the yield, which quietly trips the #66 empty-response warning on normal tool calls. Fix that, run ruff to clean up the format/lint bits, and I think we're there. Everything else is minor.

Ping me when you've pushed and I'll take another look.

@vaishnavidesai09

Copy link
Copy Markdown
Contributor Author

Hi @johnnichev,

I've addressed the requested changes:

  • Restored tools and timeout handling inside _build_request_args() and removed the resulting regression.
  • Fixed _yielded_any updates in Gemini stream() and astream() for tool-call-only streaming responses.
  • Added type annotations to the new Gemini helper methods.
  • Ran the full test suite again after the fixes.

Current results:

  • 7459 passed
  • 215 skipped

Please let me know if you'd like any additional cleanup before merge. Thanks for the review.

CI runs neither ruff nor mypy, so three issues slipped through on this
branch:

- ruff format: removing the duplicated `dimension` properties left
  stray consecutive blank lines before `__stability__` in 4 embedding
  modules. Reformatted (whitespace only).
- mypy [attr-defined]: the new shared `EmbeddingProvider.dimension`
  returns `self._dimension`, but the base class never declared it.
  Added a `_dimension: int` class annotation that documents the
  contract subclasses must satisfy.
- mypy [union-attr]: `_toolcall_from_part` reads `part.function_call.name`
  outside the callers' `if part.function_call:` guards. Narrowed with a
  local + explicit guard.

No behavior change. ruff format/check + mypy all clean on the 7 files.

Claude-Session: https://claude.ai/code/session_011xRdXi1rCM9jbpHQJkNzof
Empty commit to re-fire the pull_request workflow (squashed on merge).

Claude-Session: https://claude.ai/code/session_011xRdXi1rCM9jbpHQJkNzof

@johnnichev johnnichev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the updates. Every blocking item is resolved and verified:

  • Gemini _yielded_any (#66 safeguard): now set immediately before yield self._toolcall_from_part(part) in both stream() and astream(). The empty-tool-response warning no longer fires on healthy tool streams.
  • Anthropic redundant blocks: gone — _build_request_args is the single source of tools/timeout, no double _tools_param.
  • Types: _build_config and _toolcall_from_part are annotated.
  • #143 acceptance: the named helpers (_build_config/_toolcall_from_part, _build_request_args/_usage_from_response/_message_from_content_blocks) and the embeddings dimension→ABC move are all in, sync/async paths share them.

CI runs neither ruff nor mypy, so I pushed one small cleanup commit on top to get the branch fully green against the repo's local gates: ruff format on the embedding modules (stray blank lines after the dimension removal), a _dimension: int declaration on EmbeddingProvider so the shared property type-checks (also documents the subclass contract), and a narrowing guard in _toolcall_from_part for mypy union-attr. All whitespace/type-only, no behavior change. ruff format/check + mypy + full pre-commit all pass; CI matrix green.

Nice work on your first contribution — merging this into the next release. Thanks for sticking with the review.

@johnnichev johnnichev merged commit 9b91960 into johnnichev:main Jun 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants