fix: append /v1 for OpenAI embedding api base#6910
fix: append /v1 for OpenAI embedding api base#6910idiotsj wants to merge 3 commits intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where the OpenAI embedding client would fail to make requests to OpenAI-compatible endpoints if the configured Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_normalize_embedding_api_basehelper assumes that any base URL not ending with/v1should have/v1appended, which will produce incorrect URLs for already-versioned or path-specific bases like.../v1-betaor.../v1/embeddings; consider narrowing the normalization condition (e.g., only when the path does not already start with/vor when the host matches OpenAI) or documenting that only root-style bases are supported. - When
embedding_api_baseis configured as whitespace only,strip()leaves an empty string that bypasses normalization and is passed through toAsyncOpenAI; consider treating an empty string the same as a missing value and falling back to the default base URL instead.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_normalize_embedding_api_base` helper assumes that any base URL not ending with `/v1` should have `/v1` appended, which will produce incorrect URLs for already-versioned or path-specific bases like `.../v1-beta` or `.../v1/embeddings`; consider narrowing the normalization condition (e.g., only when the path does not already start with `/v` or when the host matches OpenAI) or documenting that only root-style bases are supported.
- When `embedding_api_base` is configured as whitespace only, `strip()` leaves an empty string that bypasses normalization and is passed through to `AsyncOpenAI`; consider treating an empty string the same as a missing value and falling back to the default base URL instead.
## Individual Comments
### Comment 1
<location path="tests/test_openai_source.py" line_range="69-71" />
<code_context>
+ )
+
+
@pytest.mark.asyncio
async def test_handle_api_error_content_moderated_removes_images():
provider = _make_provider(
</code_context>
<issue_to_address>
**suggestion (testing):** Cover edge case where `embedding_api_base` ends with a trailing slash but no `/v1`
Given the current normalization (`rstrip('/')` then conditionally appending `/v1`), please add a test for `embedding_api_base="https://example.com/openai/"` to assert it becomes `https://example.com/openai/v1/` and not `https://example.com/openai//v1/`, so the trailing-slash handling is locked in.
Suggested implementation:
```python
finally:
await provider.terminate()
def test_embedding_api_base_trailing_slash_normalized():
provider = _make_provider(
overrides={"embedding_api_base": "https://example.com/openai/"}
)
# The provider should normalize the embedding API base by removing any
# trailing slash and then appending `/v1`, resulting in a single slash.
# This asserts we do *not* end up with `https://example.com/openai//v1/`.
base_url = str(provider.client._client.base_url)
assert base_url == "https://example.com/openai/v1/"
```
Depending on how `OpenAIEmbeddingProvider` exposes its underlying OpenAI client, you may need to adjust the attribute chain used to read the base URL:
- If the provider exposes the client as `provider._client` instead of `provider.client`, change `provider.client._client.base_url` to `provider._client.base_url` or `provider._client._client.base_url`.
- If the base URL is stored on a different attribute (e.g. `provider._client.base_url` or `provider.client.base_url`), update the test accordingly while keeping the assertion value `https://example.com/openai/v1/`.
The key behavior to lock in is that `"https://example.com/openai/"` normalizes to `"https://example.com/openai/v1/"` without a double slash.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @pytest.mark.asyncio | ||
| async def test_handle_api_error_content_moderated_removes_images(): | ||
| provider = _make_provider( |
There was a problem hiding this comment.
suggestion (testing): Cover edge case where embedding_api_base ends with a trailing slash but no /v1
Given the current normalization (rstrip('/') then conditionally appending /v1), please add a test for embedding_api_base="https://example.com/openai/" to assert it becomes https://example.com/openai/v1/ and not https://example.com/openai//v1/, so the trailing-slash handling is locked in.
Suggested implementation:
finally:
await provider.terminate()
def test_embedding_api_base_trailing_slash_normalized():
provider = _make_provider(
overrides={"embedding_api_base": "https://example.com/openai/"}
)
# The provider should normalize the embedding API base by removing any
# trailing slash and then appending `/v1`, resulting in a single slash.
# This asserts we do *not* end up with `https://example.com/openai//v1/`.
base_url = str(provider.client._client.base_url)
assert base_url == "https://example.com/openai/v1/"Depending on how OpenAIEmbeddingProvider exposes its underlying OpenAI client, you may need to adjust the attribute chain used to read the base URL:
- If the provider exposes the client as
provider._clientinstead ofprovider.client, changeprovider.client._client.base_urltoprovider._client.base_urlorprovider._client._client.base_url. - If the base URL is stored on a different attribute (e.g.
provider._client.base_urlorprovider.client.base_url), update the test accordingly while keeping the assertion valuehttps://example.com/openai/v1/.
The key behavior to lock in is that "https://example.com/openai/" normalizes to "https://example.com/openai/v1/" without a double slash.
There was a problem hiding this comment.
Code Review
This pull request introduces a new static method, _normalize_embedding_api_base, within the OpenAIEmbeddingProvider to ensure that API base URLs consistently end with /v1. This normalization is applied during the initialization of the AsyncOpenAI client. New unit tests have been added to validate this behavior. A review comment points out a potential issue where an empty or whitespace-only embedding_api_base configuration could result in an empty api_base being passed to the client, leading to an InvalidURL error. It suggests a more robust approach by falling back to the default URL before normalization in such cases.
| if api_base: | ||
| api_base = self._normalize_embedding_api_base(api_base) |
There was a problem hiding this comment.
While the current logic correctly normalizes a non-empty api_base, it doesn't handle cases where api_base becomes an empty string (e.g., if the configuration provides an empty or whitespace-only value). This will cause the AsyncOpenAI client to fail with an InvalidURL error. To make this more robust, we should ensure we fall back to the default URL if api_base is empty before normalizing.
api_base = self._normalize_embedding_api_base(api_base or "https://api.openai.com/v1")|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="29" />
<code_context>
+ ``https://example.com`` or ``https://example.com/openai``. More specific
+ paths (for example ``/v1-beta`` or ``/v1/embeddings``) are preserved as-is.
+ """
+ normalized_api_base = api_base.rstrip("/")
+ parsed = urlsplit(normalized_api_base)
+ path_segments = [segment for segment in parsed.path.split("/") if segment]
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid applying `rstrip('/')` to the whole URL string to prevent altering query/fragment parts.
Because `api_base` may include query or fragment components (e.g. `https://example.com?next=/foo/`), calling `rstrip('/')` on the full string will incorrectly change those parts (`next=/foo/` → `next=/foo`). Instead, parse the URL (`parsed = urlsplit(api_base)`) and apply `rstrip('/')` only to `parsed.path`, then rebuild the URL with `parsed._replace(path=...)` so query and fragment remain intact.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
Summary
embedding_api_basefor the OpenAI embedding provider/v1automatically when the configured base URL omits it/v1suffixesProblem
When
embedding_api_baseis configured without/v1, the OpenAI embedding client uses the raw base URL and requests fail against OpenAI-compatible endpoints that expect the/v1prefix.Closes #6855
Testing
Summary by Sourcery
Normalize OpenAI embedding API base URL handling and add regression coverage for various base URL configurations.
Bug Fixes:
Tests: