Skip to content

Feat/litellm-proxy-support#124

Closed
pavanputhra wants to merge 3 commits intomainfrom
feat/litellm-proxy-support
Closed

Feat/litellm-proxy-support#124
pavanputhra wants to merge 3 commits intomainfrom
feat/litellm-proxy-support

Conversation

@pavanputhra
Copy link
Contributor

Summary

This PR adds LiteLLM proxy support and updates all OpenAI-dependent integrations so they can optionally route requests through LiteLLM while continuing to support direct OpenAI and Azure OpenAI configurations.

The implementation is designed to be fully backward compatible. Existing deployments using OpenAI or Azure should continue to work without any configuration changes.

Changes
• Added LiteLLM proxy integration support
• Updated all relevant links to support:
• LiteLLM proxy
• Direct OpenAI
• Azure OpenAI
• Updated storage components that rely on OpenAI APIs to support the new routing mechanism
• Maintained backward compatibility with existing configurations

Affected Components

Links
• analyze
• check_and_tag
• analyze_vcon
• analyze_and_label
• detect_engagement
• openai_transcribe
• deepgram

Storage
• milvus
• chatgpt_files

Notes
• No breaking changes expected
• Configuration can now switch between LiteLLM and direct provider usage

- Introduced a shared `get_openai_client` function to streamline OpenAI and Azure client initialization.
- Updated multiple modules to utilize the new client function, enhancing code maintainability and reducing redundancy.
- Added test cases to ensure proper mocking of the OpenAI client in various analysis and processing scenarios.
- Updated documentation in README.md to include instructions for running tests within the Docker environment.
- Introduced a new `transcribe_via_litellm` function to handle audio transcription through the LiteLLM proxy.
- Updated the `run` function to conditionally use LiteLLM proxy or Deepgram API based on provided options.
- Enhanced documentation for default options to include LiteLLM configuration.
- Added error handling for missing required keys for both transcription methods.
@cursor
Copy link

cursor bot commented Feb 21, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on March 17.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

- Introduced `get_async_openai_client` function for asynchronous OpenAI and Azure client initialization.
- Updated documentation to clarify usage of LiteLLM proxy for both chat and embeddings.
- Enhanced error handling for missing configuration keys in async client setup.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a shared OpenAI/Azure/LiteLLM client factory and migrates OpenAI-dependent links/storage to optionally route requests through a LiteLLM proxy while keeping direct OpenAI/Azure support.

Changes:

  • Introduces lib.openai_client.get_openai_client() (and async variant) to centralize provider/proxy selection.
  • Updates multiple links and storage modules to use the shared client constructor.
  • Adds LiteLLM-proxy transcription path to the Deepgram link, and updates/extends tests + docs accordingly.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
server/lib/openai_client.py New shared OpenAI/Azure/LiteLLM client constructors used across the codebase.
server/links/analyze/__init__.py Uses shared client creation instead of direct OpenAI/Azure client instantiation.
server/links/analyze/tests/test_analyze.py Updates tests to patch get_openai_client and inject clients.
server/links/analyze_and_label/__init__.py Switches to shared get_openai_client.
server/links/analyze_and_label/tests/test_analyze_and_label.py Updates tests to patch get_openai_client.
server/links/analyze_vcon/__init__.py Switches to shared get_openai_client.
server/links/check_and_tag/__init__.py Switches to shared get_openai_client.
server/links/detect_engagement/__init__.py Switches to shared get_openai_client and skips cleanly when credentials are absent.
server/links/detect_engagement/tests/test_detect_engagement.py Updates tests to patch get_openai_client.
server/links/deepgram_link/__init__.py Adds LiteLLM proxy transcription path (OpenAI-compatible audio transcriptions).
server/links/openai_transcribe/__init__.py Switches to shared get_openai_client.
server/storage/milvus/__init__.py Uses shared get_openai_client for embeddings generation.
server/storage/milvus/test_milvus.py Updates fixture to patch get_openai_client instead of OpenAI.
server/storage/chatgpt_files/__init__.py Uses shared get_openai_client for file/vector-store operations.
README.md Adds guidance for running tests in Docker and OTEL env workaround.
.gitignore Ignores redis_data/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

organization=opts["organization"] if opts["organization"] else None
)

# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY in opts/env)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment mentions LiteLLM proxy credentials coming from "opts/env", but get_openai_client currently reads from opts only (it does not consult environment variables). Update the comment to avoid implying env-based configuration unless the caller explicitly injects env values into opts.

Suggested change
# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY in opts/env)
# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY provided in opts)

Copilot uses AI. Check for mistakes.
api_key=options["api_key"],
)
client = get_openai_client(options)
file = client.files.create(file=open(file_name, "rb"), purpose=options["purpose"])
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.files.create is called with file=open(file_name, "rb") without closing the file handle. This can leak descriptors in long-running workers. Use a context manager (open the file in a with-block) and pass the handle, or explicitly close it after the upload call.

Suggested change
file = client.files.create(file=open(file_name, "rb"), purpose=options["purpose"])
with open(file_name, "rb") as upload_file:
file = client.files.create(file=upload_file, purpose=options["purpose"])

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +50
litellm_url = (opts.get("LITELLM_PROXY_URL") or "").strip().rstrip("/")
litellm_key = (opts.get("LITELLM_MASTER_KEY") or "").strip()

if litellm_url and litellm_key:
logger.info("Using LiteLLM proxy at %s", litellm_url)
organization = opts.get("organization") or opts.get("organization_key")
project = opts.get("project") or opts.get("project_key")
return OpenAI(
api_key=litellm_key,
base_url=litellm_url,
organization=organization if organization else None,
project=project if project else None,
timeout=120.0,
max_retries=0,
)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LiteLLM proxy base_url handling is inconsistent: sync client uses base_url=litellm_url (no "/v1"), but OpenAI clients typically expect the base URL to include the version prefix. This will likely send requests to the wrong paths unless callers include "/v1" themselves. Consider normalizing the proxy URL (append "/v1" exactly once) so both sync and async clients behave the same and configuration is unambiguous.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +102
logger.info("Using LiteLLM proxy at %s (async)", litellm_url)
return AsyncOpenAI(
api_key=litellm_key,
base_url=litellm_url + "/v1",
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_async_openai_client unconditionally appends "/v1" to LITELLM_PROXY_URL. If the configured URL already ends with "/v1", this produces "/v1/v1" and breaks requests. Normalize the URL similarly to the sync path (strip trailing slashes, then append "/v1" only if missing).

Suggested change
logger.info("Using LiteLLM proxy at %s (async)", litellm_url)
return AsyncOpenAI(
api_key=litellm_key,
base_url=litellm_url + "/v1",
# Normalize base URL: append "/v1" only if it's not already present
litellm_base_url = (
litellm_url if litellm_url.endswith("/v1") else litellm_url + "/v1"
)
logger.info("Using LiteLLM proxy at %s (async)", litellm_url)
return AsyncOpenAI(
api_key=litellm_key,
base_url=litellm_base_url,

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +63
content = audio_response.content
if not content:
logger.warning("Empty audio content from %s", url)
return None
# OpenAI client expects a file-like object with optional .name
ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:
tmp.write(content)
tmp_path = tmp.name
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transcribe_via_litellm reads the entire audio into memory via audio_response.content before writing it to disk. For longer recordings this can cause high memory usage inside the worker container. Prefer streaming to the temp file using iter_content() (or shutil.copyfileobj from response.raw) to keep memory bounded.

Suggested change
content = audio_response.content
if not content:
logger.warning("Empty audio content from %s", url)
return None
# OpenAI client expects a file-like object with optional .name
ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:
tmp.write(content)
tmp_path = tmp.name
# OpenAI client expects a file-like object with optional .name
ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:
bytes_written = 0
for chunk in audio_response.iter_content(chunk_size=8192):
if not chunk:
continue
tmp.write(chunk)
bytes_written += len(chunk)
tmp_path = tmp.name
if bytes_written == 0:
logger.warning("Empty audio content from %s", url)
try:
os.unlink(tmp_path)
except OSError:
pass
return None

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +76
# Shape expected by rest of link (confidence not provided by OpenAI-format response)
return {
"transcript": text,
"confidence": 1.0,
"detected_language": "en",
}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transcribe_via_litellm fabricates values for "confidence" (1.0) and "detected_language" ("en"). Downstream logic uses confidence thresholds and metrics, so this can silently change behavior (e.g., always passing minimum_confidence). Consider returning None/omitting these fields and updating the caller to handle missing confidence/language (or explicitly documenting that confidence is unavailable in the LiteLLM path).

Copilot uses AI. Check for mistakes.
Comment on lines +347 to +352
@patch('server.links.analyze_and_label.get_openai_client')
@patch('server.links.analyze_and_label.get_openai_client')
@patch('server.links.analyze_and_label.generate_analysis_with_labels')
@patch('server.links.analyze_and_label.is_included', return_value=True)
@patch('server.links.analyze_and_label.randomly_execute_with_sampling', return_value=True)
def test_run_analysis_exception(mock_sampling, mock_is_included, mock_generate_analysis, mock_redis_with_vcon, sample_vcon):
def test_run_analysis_exception(mock_sampling, mock_is_included, mock_generate_analysis, mock_get_client, mock_redis_with_vcon, sample_vcon):
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two identical @patch('server.links.analyze_and_label.get_openai_client') decorators stacked on this test. This will inject two mocks but the test function only accepts one, causing the test to error (argument mismatch) and making the patching behavior ambiguous. Remove the duplicate patch (or update the signature accordingly).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants