Feat/litellm-proxy-support by pavanputhra · Pull Request #124 · vcon-dev/vcon-server

pavanputhra · 2026-02-21T18:57:13Z

Summary

This PR adds LiteLLM proxy support and updates all OpenAI-dependent integrations so they can optionally route requests through LiteLLM while continuing to support direct OpenAI and Azure OpenAI configurations.

The implementation is designed to be fully backward compatible. Existing deployments using OpenAI or Azure should continue to work without any configuration changes.

⸻

Changes
• Added LiteLLM proxy integration support
• Updated all relevant links to support:
• LiteLLM proxy
• Direct OpenAI
• Azure OpenAI
• Updated storage components that rely on OpenAI APIs to support the new routing mechanism
• Maintained backward compatibility with existing configurations

⸻

Affected Components

Links
• analyze
• check_and_tag
• analyze_vcon
• analyze_and_label
• detect_engagement
• openai_transcribe
• deepgram

Storage
• milvus
• chatgpt_files

⸻

Notes
• No breaking changes expected
• Configuration can now switch between LiteLLM and direct provider usage

- Introduced a shared `get_openai_client` function to streamline OpenAI and Azure client initialization. - Updated multiple modules to utilize the new client function, enhancing code maintainability and reducing redundancy. - Added test cases to ensure proper mocking of the OpenAI client in various analysis and processing scenarios. - Updated documentation in README.md to include instructions for running tests within the Docker environment.

- Introduced a new `transcribe_via_litellm` function to handle audio transcription through the LiteLLM proxy. - Updated the `run` function to conditionally use LiteLLM proxy or Deepgram API based on provided options. - Enhanced documentation for default options to include LiteLLM configuration. - Added error handling for missing required keys for both transcription methods.

cursor · 2026-02-21T18:57:16Z

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on March 17.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

- Introduced `get_async_openai_client` function for asynchronous OpenAI and Azure client initialization. - Updated documentation to clarify usage of LiteLLM proxy for both chat and embeddings. - Enhanced error handling for missing configuration keys in async client setup.

Copilot

Pull request overview

Adds a shared OpenAI/Azure/LiteLLM client factory and migrates OpenAI-dependent links/storage to optionally route requests through a LiteLLM proxy while keeping direct OpenAI/Azure support.

Changes:

Introduces lib.openai_client.get_openai_client() (and async variant) to centralize provider/proxy selection.
Updates multiple links and storage modules to use the shared client constructor.
Adds LiteLLM-proxy transcription path to the Deepgram link, and updates/extends tests + docs accordingly.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`server/lib/openai_client.py`	New shared OpenAI/Azure/LiteLLM client constructors used across the codebase.
`server/links/analyze/__init__.py`	Uses shared client creation instead of direct OpenAI/Azure client instantiation.
`server/links/analyze/tests/test_analyze.py`	Updates tests to patch `get_openai_client` and inject clients.
`server/links/analyze_and_label/__init__.py`	Switches to shared `get_openai_client`.
`server/links/analyze_and_label/tests/test_analyze_and_label.py`	Updates tests to patch `get_openai_client`.
`server/links/analyze_vcon/__init__.py`	Switches to shared `get_openai_client`.
`server/links/check_and_tag/__init__.py`	Switches to shared `get_openai_client`.
`server/links/detect_engagement/__init__.py`	Switches to shared `get_openai_client` and skips cleanly when credentials are absent.
`server/links/detect_engagement/tests/test_detect_engagement.py`	Updates tests to patch `get_openai_client`.
`server/links/deepgram_link/__init__.py`	Adds LiteLLM proxy transcription path (OpenAI-compatible audio transcriptions).
`server/links/openai_transcribe/__init__.py`	Switches to shared `get_openai_client`.
`server/storage/milvus/__init__.py`	Uses shared `get_openai_client` for embeddings generation.
`server/storage/milvus/test_milvus.py`	Updates fixture to patch `get_openai_client` instead of `OpenAI`.
`server/storage/chatgpt_files/__init__.py`	Uses shared `get_openai_client` for file/vector-store operations.
`README.md`	Adds guidance for running tests in Docker and OTEL env workaround.
`.gitignore`	Ignores `redis_data/` directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T20:53:37Z

server/storage/milvus/__init__.py

-            organization=opts["organization"] if opts["organization"] else None
-        )
-
+        # Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY in opts/env)


Comment mentions LiteLLM proxy credentials coming from "opts/env", but get_openai_client currently reads from opts only (it does not consult environment variables). Update the comment to avoid implying env-based configuration unless the caller explicitly injects env values into opts.

Suggested change

# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY in opts/env)

# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY provided in opts)

Copilot · 2026-03-02T20:53:38Z

server/storage/chatgpt_files/__init__.py

-            api_key=options["api_key"],
-        )
+        client = get_openai_client(options)
        file = client.files.create(file=open(file_name, "rb"), purpose=options["purpose"])


client.files.create is called with file=open(file_name, "rb") without closing the file handle. This can leak descriptors in long-running workers. Use a context manager (open the file in a with-block) and pass the handle, or explicitly close it after the upload call.

Suggested change

file = client.files.create(file=open(file_name, "rb"), purpose=options["purpose"])

with open(file_name, "rb") as upload_file:

file = client.files.create(file=upload_file, purpose=options["purpose"])

Copilot · 2026-03-02T20:53:38Z

server/lib/openai_client.py

+    litellm_url = (opts.get("LITELLM_PROXY_URL") or "").strip().rstrip("/")
+    litellm_key = (opts.get("LITELLM_MASTER_KEY") or "").strip()
+
+    if litellm_url and litellm_key:
+        logger.info("Using LiteLLM proxy at %s", litellm_url)
+        organization = opts.get("organization") or opts.get("organization_key")
+        project = opts.get("project") or opts.get("project_key")
+        return OpenAI(
+            api_key=litellm_key,
+            base_url=litellm_url,
+            organization=organization if organization else None,
+            project=project if project else None,
+            timeout=120.0,
+            max_retries=0,
+        )


LiteLLM proxy base_url handling is inconsistent: sync client uses base_url=litellm_url (no "/v1"), but OpenAI clients typically expect the base URL to include the version prefix. This will likely send requests to the wrong paths unless callers include "/v1" themselves. Consider normalizing the proxy URL (append "/v1" exactly once) so both sync and async clients behave the same and configuration is unambiguous.

Copilot · 2026-03-02T20:53:38Z

server/lib/openai_client.py

+        logger.info("Using LiteLLM proxy at %s (async)", litellm_url)
+        return AsyncOpenAI(
+            api_key=litellm_key,
+            base_url=litellm_url + "/v1",


get_async_openai_client unconditionally appends "/v1" to LITELLM_PROXY_URL. If the configured URL already ends with "/v1", this produces "/v1/v1" and breaks requests. Normalize the URL similarly to the sync path (strip trailing slashes, then append "/v1" only if missing).

Suggested change

logger.info("Using LiteLLM proxy at %s (async)", litellm_url)

return AsyncOpenAI(

api_key=litellm_key,

base_url=litellm_url + "/v1",

# Normalize base URL: append "/v1" only if it's not already present

litellm_base_url = (

litellm_url if litellm_url.endswith("/v1") else litellm_url + "/v1"

)

logger.info("Using LiteLLM proxy at %s (async)", litellm_url)

return AsyncOpenAI(

api_key=litellm_key,

base_url=litellm_base_url,

Copilot · 2026-03-02T20:53:39Z

server/links/deepgram_link/__init__.py

+    content = audio_response.content
+    if not content:
+        logger.warning("Empty audio content from %s", url)
+        return None
+    # OpenAI client expects a file-like object with optional .name
+    ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"
+    with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:
+        tmp.write(content)
+        tmp_path = tmp.name


transcribe_via_litellm reads the entire audio into memory via audio_response.content before writing it to disk. For longer recordings this can cause high memory usage inside the worker container. Prefer streaming to the temp file using iter_content() (or shutil.copyfileobj from response.raw) to keep memory bounded.

Suggested change

content = audio_response.content

if not content:

logger.warning("Empty audio content from %s", url)

return None

# OpenAI client expects a file-like object with optional .name

ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"

with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:

tmp.write(content)

tmp_path = tmp.name

# OpenAI client expects a file-like object with optional .name

ext = os.path.splitext(url.split("?")[0])[-1] or ".mp3"

with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:

bytes_written = 0

for chunk in audio_response.iter_content(chunk_size=8192):

if not chunk:

continue

tmp.write(chunk)

bytes_written += len(chunk)

tmp_path = tmp.name

if bytes_written == 0:

logger.warning("Empty audio content from %s", url)

try:

os.unlink(tmp_path)

except OSError:

pass

return None

Copilot · 2026-03-02T20:53:39Z

server/links/deepgram_link/__init__.py

+        # Shape expected by rest of link (confidence not provided by OpenAI-format response)
+        return {
+            "transcript": text,
+            "confidence": 1.0,
+            "detected_language": "en",
+        }


transcribe_via_litellm fabricates values for "confidence" (1.0) and "detected_language" ("en"). Downstream logic uses confidence thresholds and metrics, so this can silently change behavior (e.g., always passing minimum_confidence). Consider returning None/omitting these fields and updating the caller to handle missing confidence/language (or explicitly documenting that confidence is unavailable in the LiteLLM path).

Copilot · 2026-03-02T20:53:39Z

server/links/analyze_and_label/tests/test_analyze_and_label.py

+@patch('server.links.analyze_and_label.get_openai_client')
+@patch('server.links.analyze_and_label.get_openai_client')
 @patch('server.links.analyze_and_label.generate_analysis_with_labels')
 @patch('server.links.analyze_and_label.is_included', return_value=True)
 @patch('server.links.analyze_and_label.randomly_execute_with_sampling', return_value=True)
-def test_run_analysis_exception(mock_sampling, mock_is_included, mock_generate_analysis, mock_redis_with_vcon, sample_vcon):
+def test_run_analysis_exception(mock_sampling, mock_is_included, mock_generate_analysis, mock_get_client, mock_redis_with_vcon, sample_vcon):


There are two identical @patch('server.links.analyze_and_label.get_openai_client') decorators stacked on this test. This will inject two mocks but the test function only accepts one, causing the test to error (argument mismatch) and making the patching behavior ambiguous. Remove the duplicate patch (or update the signature accordingly).

pavanputhra added 2 commits February 21, 2026 22:47

howethomas requested a review from Copilot March 2, 2026 20:47

Copilot started reviewing on behalf of howethomas March 2, 2026 20:48 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

pavanputhra mentioned this pull request Mar 4, 2026

Enhance LLM integration and configuration #114

Closed

howethomas closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/litellm-proxy-support#124

Feat/litellm-proxy-support#124
pavanputhra wants to merge 3 commits intomainfrom
feat/litellm-proxy-support

pavanputhra commented Feb 21, 2026

Uh oh!

cursor bot commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY in opts/env)
	# Initialize OpenAI client (supports LiteLLM proxy via LITELLM_PROXY_URL + LITELLM_MASTER_KEY provided in opts)

	file = client.files.create(file=open(file_name, "rb"), purpose=options["purpose"])
	with open(file_name, "rb") as upload_file:
	file = client.files.create(file=upload_file, purpose=options["purpose"])

Conversation

pavanputhra commented Feb 21, 2026

Uh oh!

cursor bot commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants