Skip to content

Universal LLM Gateway: One API, every LLM. OpenAI-compatible endpoints with multi-provider translation and intelligent load-balancing.

License

Notifications You must be signed in to change notification settings

Mirrowel/LLM-API-Key-Proxy

Repository files navigation

Universal LLM API Proxy & Resilience Library

ko-fi Ask DeepWiki zread

One proxy. Any LLM provider. Zero code changes.

A self-hosted proxy that provides a single, OpenAI-compatible API endpoint for all your LLM providers. Works with any application that supports custom OpenAI base URLs—no code changes required in your existing tools.

This project consists of two components:

  1. The API Proxy — A FastAPI application providing a universal /v1/chat/completions endpoint
  2. The Resilience Library — A reusable Python library for intelligent API key management, rotation, and failover

Why Use This?

  • Universal Compatibility — Works with any app supporting OpenAI-compatible APIs: Opencode, Continue, Roo/Kilo Code, JanitorAI, SillyTavern, custom applications, and more
  • One Endpoint, Many Providers — Configure Gemini, OpenAI, Anthropic, and any LiteLLM-supported provider once. Access them all through a single API key
  • Built-in Resilience — Automatic key rotation, failover on errors, rate limit handling, and intelligent cooldowns
  • Exclusive Provider Support — Includes custom providers not available elsewhere: Antigravity (Gemini 3 + Claude Sonnet/Opus 4.5), Gemini CLI, Qwen Code, and iFlow

Quick Start

Windows

  1. Download the latest release from GitHub Releases
  2. Unzip the downloaded file
  3. Run proxy_app.exe — the interactive TUI launcher opens

macOS / Linux

# Download and extract the release for your platform
chmod +x proxy_app
./proxy_app

From Source

git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
cd LLM-API-Key-Proxy
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
python src/proxy_app/main.py

Tip: Running with command-line arguments (e.g., --host 0.0.0.0 --port 8000) bypasses the TUI and starts the proxy directly.


Connecting to the Proxy

Once the proxy is running, configure your application with these settings:

Setting Value
Base URL / API Endpoint http://127.0.0.1:8000/v1
API Key Your PROXY_API_KEY

Model Format: provider/model_name

Important: Models must be specified in the format provider/model_name. The provider/ prefix tells the proxy which backend to route the request to.

gemini/gemini-2.5-flash          ← Gemini API
openai/gpt-4o                    ← OpenAI API
anthropic/claude-3-5-sonnet      ← Anthropic API
openrouter/anthropic/claude-3-opus  ← OpenRouter
gemini_cli/gemini-2.5-pro        ← Gemini CLI (OAuth)
antigravity/gemini-3-pro-preview ← Antigravity (Gemini 3, Claude Opus 4.5)

Usage Examples

Python (OpenAI Library)
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="your-proxy-api-key"
)

response = client.chat.completions.create(
    model="gemini/gemini-2.5-flash",  # provider/model format
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
curl
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-api-key" \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'
JanitorAI / SillyTavern / Other Chat UIs
  1. Go to API Settings
  2. Select "Proxy" or "Custom OpenAI" mode
  3. Configure:
    • API URL: http://127.0.0.1:8000/v1
    • API Key: Your PROXY_API_KEY
    • Model: provider/model_name (e.g., gemini/gemini-2.5-flash)
  4. Save and start chatting
Continue / Cursor / IDE Extensions

In your configuration file (e.g., config.json):

{
  "models": [{
    "title": "Gemini via Proxy",
    "provider": "openai",
    "model": "gemini/gemini-2.5-flash",
    "apiBase": "http://127.0.0.1:8000/v1",
    "apiKey": "your-proxy-api-key"
  }]
}

API Endpoints

Endpoint Description
GET / Status check — confirms proxy is running
POST /v1/chat/completions Chat completions (main endpoint)
POST /v1/embeddings Text embeddings
GET /v1/models List all available models with pricing & capabilities
GET /v1/models/{model_id} Get details for a specific model
GET /v1/providers List configured providers
POST /v1/token-count Calculate token count for a payload
POST /v1/cost-estimate Estimate cost based on token counts

Tip: The /v1/models endpoint is useful for discovering available models in your client. Many apps can fetch this list automatically. Add ?enriched=false for a minimal response without pricing data.


Managing Credentials

The proxy includes an interactive tool for managing all your API keys and OAuth credentials.

Using the TUI

  1. Run the proxy without arguments to open the TUI
  2. Select "🔑 Manage Credentials"
  3. Choose to add API keys or OAuth credentials

Using the Command Line

python -m rotator_library.credential_tool

Credential Types

Type Providers How to Add
API Keys Gemini, OpenAI, Anthropic, OpenRouter, Groq, Mistral, NVIDIA, Cohere, Chutes Enter key in TUI or add to .env
OAuth Gemini CLI, Antigravity, Qwen Code, iFlow Interactive browser login via credential tool

The .env File

Credentials are stored in a .env file. You can edit it directly or use the TUI:

# Required: Authentication key for YOUR proxy
PROXY_API_KEY="your-secret-proxy-key"

# Provider API Keys (add multiple with _1, _2, etc.)
GEMINI_API_KEY_1="your-gemini-key"
GEMINI_API_KEY_2="another-gemini-key"
OPENAI_API_KEY_1="your-openai-key"
ANTHROPIC_API_KEY_1="your-anthropic-key"

Copy .env.example to .env as a starting point.


The Resilience Library

The proxy is powered by a standalone Python library that you can use directly in your own applications.

Key Features

  • Async-native with asyncio and httpx
  • Intelligent key selection with tiered, model-aware locking
  • Deadline-driven requests with configurable global timeout
  • Automatic failover between keys on errors
  • OAuth support for Gemini CLI, Antigravity, Qwen, iFlow
  • Stateless deployment ready — load credentials from environment variables

Basic Usage

from rotator_library import RotatingClient

client = RotatingClient(
    api_keys={"gemini": ["key1", "key2"], "openai": ["key3"]},
    global_timeout=30,
    max_retries=2
)

async with client:
    response = await client.acompletion(
        model="gemini/gemini-2.5-flash",
        messages=[{"role": "user", "content": "Hello!"}]
    )

Library Documentation

See the Library README for complete documentation including:

  • All initialization parameters
  • Streaming support
  • Error handling and cooldown strategies
  • Provider plugin system
  • Credential prioritization

Interactive TUI

The proxy includes a powerful text-based UI for configuration and management.

TUI Features

  • 🚀 Run Proxy — Start the server with saved settings
  • ⚙️ Configure Settings — Host, port, API key, request logging
  • 🔑 Manage Credentials — Add/edit API keys and OAuth credentials
  • 📊 View Status — See configured providers and credential counts
  • 🔧 Advanced Settings — Custom providers, model definitions, concurrency

Configuration Files

File Contents
.env All credentials and advanced settings
launcher_config.json TUI-specific settings (host, port, logging)

Features

Core Capabilities

  • Universal OpenAI-compatible endpoint for all providers
  • Multi-provider support via LiteLLM fallback
  • Automatic key rotation and load balancing
  • Interactive TUI for easy configuration
  • Detailed request logging for debugging
🛡️ Resilience & High Availability
  • Global timeout with deadline-driven retries
  • Escalating cooldowns per model (10s → 30s → 60s → 120s)
  • Key-level lockouts for consistently failing keys
  • Stream error detection and graceful recovery
  • Batch embedding aggregation for improved throughput
  • Automatic daily resets for cooldowns and usage stats
🔑 Credential Management
  • Auto-discovery of API keys from environment variables
  • OAuth discovery from standard paths (~/.gemini/, ~/.qwen/, ~/.iflow/)
  • Duplicate detection warns when same account added multiple times
  • Credential prioritization — paid tier used before free tier
  • Stateless deployment — export OAuth to environment variables
  • Local-first storage — credentials isolated in oauth_creds/ directory
⚙️ Advanced Configuration
  • Model whitelists/blacklists with wildcard support
  • Per-provider concurrency limits (MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>)
  • Rotation modes — balanced (distribute load) or sequential (use until exhausted)
  • Priority multipliers — higher concurrency for paid credentials
  • Model quota groups — shared cooldowns for related models
  • Temperature override — prevent tool hallucination issues
  • Weighted random rotation — unpredictable selection patterns
🔌 Provider-Specific Features

Gemini CLI:

  • Zero-config Google Cloud project discovery
  • Internal API access with higher rate limits
  • Automatic fallback to preview models on rate limit
  • Paid vs free tier detection

Antigravity:

  • Gemini 3 Pro with thinkingLevel support
  • Claude Opus 4.5 (thinking mode)
  • Claude Sonnet 4.5 (thinking and non-thinking)
  • Thought signature caching for multi-turn conversations
  • Tool hallucination prevention

Qwen Code:

  • Dual auth (API key + OAuth Device Flow)
  • <think> tag parsing as reasoning_content
  • Tool schema cleaning

iFlow:

  • Dual auth (API key + OAuth Authorization Code)
  • Hybrid auth with separate API key fetch
  • Tool schema cleaning

NVIDIA NIM:

  • Dynamic model discovery
  • DeepSeek thinking support
📝 Logging & Debugging
  • Per-request file logging with --enable-request-logging
  • Unique request directories with full transaction details
  • Streaming chunk capture for debugging
  • Performance metadata (duration, tokens, model used)
  • Provider-specific logs for Qwen, iFlow, Antigravity

Advanced Configuration

Environment Variables Reference

Proxy Settings

Variable Description Default
PROXY_API_KEY Authentication key for your proxy Required
OAUTH_REFRESH_INTERVAL Token refresh check interval (seconds) 600
SKIP_OAUTH_INIT_CHECK Skip interactive OAuth setup on startup false

Per-Provider Settings

Pattern Description Example
<PROVIDER>_API_KEY_<N> API key for provider GEMINI_API_KEY_1
MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER> Concurrent request limit MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3
ROTATION_MODE_<PROVIDER> balanced or sequential ROTATION_MODE_GEMINI=sequential
IGNORE_MODELS_<PROVIDER> Blacklist (comma-separated, supports *) IGNORE_MODELS_OPENAI=*-preview*
WHITELIST_MODELS_<PROVIDER> Whitelist (overrides blacklist) WHITELIST_MODELS_GEMINI=gemini-2.5-pro

Advanced Features

Variable Description
ROTATION_TOLERANCE 0.0=deterministic, 3.0=weighted random (default)
CONCURRENCY_MULTIPLIER_<PROVIDER>_PRIORITY_<N> Concurrency multiplier per priority tier
QUOTA_GROUPS_<PROVIDER>_<GROUP> Models sharing quota limits
OVERRIDE_TEMPERATURE_ZERO remove or set to prevent tool hallucination
Model Filtering (Whitelists & Blacklists)

Control which models are exposed through your proxy.

Blacklist Only

# Hide all preview models
IGNORE_MODELS_OPENAI="*-preview*"

Pure Whitelist Mode

# Block all, then allow specific models
IGNORE_MODELS_GEMINI="*"
WHITELIST_MODELS_GEMINI="gemini-2.5-pro,gemini-2.5-flash"

Exemption Mode

# Block preview models, but allow one specific preview
IGNORE_MODELS_OPENAI="*-preview*"
WHITELIST_MODELS_OPENAI="gpt-4o-2024-08-06-preview"

Logic order: Whitelist check → Blacklist check → Default allow

Concurrency & Rotation Settings

Concurrency Limits

# Allow 3 concurrent requests per OpenAI key
MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3

# Default is 1 (no concurrency)
MAX_CONCURRENT_REQUESTS_PER_KEY_GEMINI=1

Rotation Modes

# balanced (default): Distribute load evenly - best for per-minute rate limits
ROTATION_MODE_OPENAI=balanced

# sequential: Use until exhausted - best for daily/weekly quotas
ROTATION_MODE_GEMINI=sequential

Priority Multipliers

Paid credentials can handle more concurrent requests:

# Priority 1 (paid ultra): 10x concurrency
CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_1=10

# Priority 2 (standard paid): 3x
CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_2=3

Model Quota Groups

Models sharing quota limits:

# Claude models share quota - when one hits limit, both cool down
QUOTA_GROUPS_ANTIGRAVITY_CLAUDE="claude-sonnet-4-5,claude-opus-4-5"
Timeout Configuration

Fine-grained control over HTTP timeouts:

TIMEOUT_CONNECT=30              # Connection establishment
TIMEOUT_WRITE=30                # Request body send
TIMEOUT_POOL=60                 # Connection pool acquisition
TIMEOUT_READ_STREAMING=180      # Between streaming chunks (3 min)
TIMEOUT_READ_NON_STREAMING=600  # Full response wait (10 min)

Recommendations:

  • Long thinking tasks: Increase TIMEOUT_READ_STREAMING to 300-360s
  • Unstable network: Increase TIMEOUT_CONNECT to 60s
  • Large outputs: Increase TIMEOUT_READ_NON_STREAMING to 900s+

OAuth Providers

Gemini CLI

Uses Google OAuth to access internal Gemini endpoints with higher rate limits.

Setup:

  1. Run python -m rotator_library.credential_tool
  2. Select "Add OAuth Credential" → "Gemini CLI"
  3. Complete browser authentication
  4. Credentials saved to oauth_creds/gemini_cli_oauth_1.json

Features:

  • Zero-config project discovery
  • Automatic free-tier project onboarding
  • Paid vs free tier detection
  • Smart fallback on rate limits

Environment Variables (for stateless deployment):

GEMINI_CLI_ACCESS_TOKEN="ya29.your-access-token"
GEMINI_CLI_REFRESH_TOKEN="1//your-refresh-token"
GEMINI_CLI_EXPIRY_DATE="1234567890000"
GEMINI_CLI_EMAIL="[email protected]"
GEMINI_CLI_PROJECT_ID="your-gcp-project-id"  # Optional
Antigravity (Gemini 3 + Claude Opus 4.5)

Access Google's internal Antigravity API for cutting-edge models.

Supported Models:

  • Gemini 3 Pro — with thinkingLevel support (low/high)
  • Claude Opus 4.5 — Anthropic's most powerful model (thinking mode only)
  • Claude Sonnet 4.5 — supports both thinking and non-thinking modes
  • Gemini 2.5 Pro/Flash

Setup:

  1. Run python -m rotator_library.credential_tool
  2. Select "Add OAuth Credential" → "Antigravity"
  3. Complete browser authentication

Advanced Features:

  • Thought signature caching for multi-turn conversations
  • Tool hallucination prevention via parameter signature injection
  • Automatic thinking block sanitization for Claude
  • Credential prioritization (paid resets every 5 hours, free weekly)

Environment Variables:

ANTIGRAVITY_ACCESS_TOKEN="ya29.your-access-token"
ANTIGRAVITY_REFRESH_TOKEN="1//your-refresh-token"
ANTIGRAVITY_EXPIRY_DATE="1234567890000"
ANTIGRAVITY_EMAIL="[email protected]"

# Feature toggles
ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
ANTIGRAVITY_GEMINI3_TOOL_FIX=true

Note: Gemini 3 models require a paid-tier Google Cloud project.

Qwen Code

Uses OAuth Device Flow for Qwen/Dashscope APIs.

Setup:

  1. Run the credential tool
  2. Select "Add OAuth Credential" → "Qwen Code"
  3. Enter the code displayed in your browser
  4. Or add API key directly: QWEN_CODE_API_KEY_1="your-key"

Features:

  • Dual auth (API key or OAuth)
  • <think> tag parsing as reasoning_content
  • Automatic tool schema cleaning
  • Custom models via QWEN_CODE_MODELS env var
iFlow

Uses OAuth Authorization Code flow with local callback server.

Setup:

  1. Run the credential tool
  2. Select "Add OAuth Credential" → "iFlow"
  3. Complete browser authentication (callback on port 11451)
  4. Or add API key directly: IFLOW_API_KEY_1="sk-your-key"

Features:

  • Dual auth (API key or OAuth)
  • Hybrid auth (OAuth token fetches separate API key)
  • Automatic tool schema cleaning
  • Custom models via IFLOW_MODELS env var
Stateless Deployment (Export to Environment Variables)

For platforms without file persistence (Railway, Render, Vercel):

  1. Set up credentials locally:

    python -m rotator_library.credential_tool
    # Complete OAuth flows
  2. Export to environment variables:

    python -m rotator_library.credential_tool
    # Select "Export [Provider] to .env"
  3. Copy generated variables to your platform: The tool creates files like gemini_cli_credential_1.env containing all necessary variables.

  4. Set SKIP_OAUTH_INIT_CHECK=true to skip interactive validation on startup.

OAuth Callback Port Configuration

Customize OAuth callback ports if defaults conflict:

Provider Default Port Environment Variable
Gemini CLI 8085 GEMINI_CLI_OAUTH_PORT
Antigravity 51121 ANTIGRAVITY_OAUTH_PORT
iFlow 11451 IFLOW_OAUTH_PORT

Deployment

Command-Line Arguments
python src/proxy_app/main.py [OPTIONS]

Options:
  --host TEXT                Host to bind (default: 0.0.0.0)
  --port INTEGER             Port to run on (default: 8000)
  --enable-request-logging   Enable detailed per-request logging
  --add-credential           Launch interactive credential setup tool

Examples:

# Run on custom port
python src/proxy_app/main.py --host 127.0.0.1 --port 9000

# Run with logging
python src/proxy_app/main.py --enable-request-logging

# Add credentials without starting proxy
python src/proxy_app/main.py --add-credential
Render / Railway / Vercel

See the Deployment Guide for complete instructions.

Quick Setup:

  1. Fork the repository
  2. Create a .env file with your credentials
  3. Create a new Web Service pointing to your repo
  4. Set build command: pip install -r requirements.txt
  5. Set start command: uvicorn src.proxy_app.main:app --host 0.0.0.0 --port $PORT
  6. Upload .env as a secret file

OAuth Credentials: Export OAuth credentials to environment variables using the credential tool, then add them to your platform's environment settings.

Custom VPS / Docker

Option 1: Authenticate locally, deploy credentials

  1. Complete OAuth flows on your local machine
  2. Export to environment variables
  3. Deploy .env to your server

Option 2: SSH Port Forwarding

# Forward callback ports through SSH
ssh -L 51121:localhost:51121 -L 8085:localhost:8085 user@your-vps

# Then run credential tool on the VPS

Systemd Service:

[Unit]
Description=LLM API Key Proxy
After=network.target

[Service]
Type=simple
WorkingDirectory=/path/to/LLM-API-Key-Proxy
ExecStart=/path/to/python -m uvicorn src.proxy_app.main:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target

See VPS Deployment for complete guide.


Troubleshooting

Issue Solution
401 Unauthorized Verify PROXY_API_KEY matches your Authorization: Bearer header exactly
500 Internal Server Error Check provider key validity; enable --enable-request-logging for details
All keys on cooldown All keys failed recently; check logs/detailed_logs/ for upstream errors
Model not found Verify format is provider/model_name (e.g., gemini/gemini-2.5-flash)
OAuth callback failed Ensure callback port (8085, 51121, 11451) isn't blocked by firewall
Streaming hangs Increase TIMEOUT_READ_STREAMING; check provider status

Detailed Logs:

When --enable-request-logging is enabled, check logs/detailed_logs/ for:

  • request.json — Exact request payload
  • final_response.json — Complete response or error
  • streaming_chunks.jsonl — All SSE chunks received
  • metadata.json — Performance metrics

Documentation

Document Description
Technical Documentation Architecture, internals, provider implementations
Library README Using the resilience library directly
Deployment Guide Hosting on Render, Railway, VPS
.env.example Complete environment variable reference

License

This project is dual-licensed:

  • Proxy Application (src/proxy_app/) — MIT License
  • Resilience Library (src/rotator_library/) — LGPL-3.0

About

Universal LLM Gateway: One API, every LLM. OpenAI-compatible endpoints with multi-provider translation and intelligent load-balancing.

Topics

Resources

License

Stars

Watchers

Forks

Languages