Universal LLM API Proxy & Resilience Library

One proxy. Any LLM provider. Zero code changes.

A self-hosted proxy that provides a single, OpenAI-compatible API endpoint for all your LLM providers. Works with any application that supports custom OpenAI base URLs—no code changes required in your existing tools.

This project consists of two components:

The API Proxy — A FastAPI application providing a universal /v1/chat/completions endpoint
The Resilience Library — A reusable Python library for intelligent API key management, rotation, and failover

Why Use This?

Universal Compatibility — Works with any app supporting OpenAI-compatible APIs: Opencode, Continue, Roo/Kilo Code, JanitorAI, SillyTavern, custom applications, and more
One Endpoint, Many Providers — Configure Gemini, OpenAI, Anthropic, and any LiteLLM-supported provider once. Access them all through a single API key
Built-in Resilience — Automatic key rotation, failover on errors, rate limit handling, and intelligent cooldowns
Exclusive Provider Support — Includes custom providers not available elsewhere: Antigravity (Gemini 3 + Claude Sonnet/Opus 4.5), Gemini CLI, Qwen Code, and iFlow

Quick Start

Windows

Download the latest release from GitHub Releases
Unzip the downloaded file
Run proxy_app.exe — the interactive TUI launcher opens

macOS / Linux

# Download and extract the release for your platform
chmod +x proxy_app
./proxy_app

From Source

git clone https://github.com/Mirrowel/LLM-API-Key-Proxy.git
cd LLM-API-Key-Proxy
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
python src/proxy_app/main.py

Tip: Running with command-line arguments (e.g., --host 0.0.0.0 --port 8000) bypasses the TUI and starts the proxy directly.

Connecting to the Proxy

Once the proxy is running, configure your application with these settings:

Setting	Value
Base URL / API Endpoint	`http://127.0.0.1:8000/v1`
API Key	Your `PROXY_API_KEY`

Model Format: `provider/model_name`

Important: Models must be specified in the format provider/model_name. The provider/ prefix tells the proxy which backend to route the request to.

gemini/gemini-2.5-flash          ← Gemini API
openai/gpt-4o                    ← OpenAI API
anthropic/claude-3-5-sonnet      ← Anthropic API
openrouter/anthropic/claude-3-opus  ← OpenRouter
gemini_cli/gemini-2.5-pro        ← Gemini CLI (OAuth)
antigravity/gemini-3-pro-preview ← Antigravity (Gemini 3, Claude Opus 4.5)

Usage Examples

Python (OpenAI Library)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="your-proxy-api-key"
)

response = client.chat.completions.create(
    model="gemini/gemini-2.5-flash",  # provider/model format
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

curl

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-api-key" \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

JanitorAI / SillyTavern / Other Chat UIs

Go to API Settings
Select "Proxy" or "Custom OpenAI" mode
Configure:
- API URL: http://127.0.0.1:8000/v1
- API Key: Your PROXY_API_KEY
- Model: provider/model_name (e.g., gemini/gemini-2.5-flash)
Save and start chatting

Continue / Cursor / IDE Extensions

In your configuration file (e.g., config.json):

{
  "models": [{
    "title": "Gemini via Proxy",
    "provider": "openai",
    "model": "gemini/gemini-2.5-flash",
    "apiBase": "http://127.0.0.1:8000/v1",
    "apiKey": "your-proxy-api-key"
  }]
}

API Endpoints

Endpoint	Description
`GET /`	Status check — confirms proxy is running
`POST /v1/chat/completions`	Chat completions (main endpoint)
`POST /v1/embeddings`	Text embeddings
`GET /v1/models`	List all available models with pricing & capabilities
`GET /v1/models/{model_id}`	Get details for a specific model
`GET /v1/providers`	List configured providers
`POST /v1/token-count`	Calculate token count for a payload
`POST /v1/cost-estimate`	Estimate cost based on token counts

Tip: The /v1/models endpoint is useful for discovering available models in your client. Many apps can fetch this list automatically. Add ?enriched=false for a minimal response without pricing data.

Managing Credentials

The proxy includes an interactive tool for managing all your API keys and OAuth credentials.

Using the TUI

Run the proxy without arguments to open the TUI
Select "🔑 Manage Credentials"
Choose to add API keys or OAuth credentials

Using the Command Line

python -m rotator_library.credential_tool

Credential Types

Type	Providers	How to Add
API Keys	Gemini, OpenAI, Anthropic, OpenRouter, Groq, Mistral, NVIDIA, Cohere, Chutes	Enter key in TUI or add to `.env`
OAuth	Gemini CLI, Antigravity, Qwen Code, iFlow	Interactive browser login via credential tool

The `.env` File

Credentials are stored in a .env file. You can edit it directly or use the TUI:

# Required: Authentication key for YOUR proxy
PROXY_API_KEY="your-secret-proxy-key"

# Provider API Keys (add multiple with _1, _2, etc.)
GEMINI_API_KEY_1="your-gemini-key"
GEMINI_API_KEY_2="another-gemini-key"
OPENAI_API_KEY_1="your-openai-key"
ANTHROPIC_API_KEY_1="your-anthropic-key"

Copy .env.example to .env as a starting point.

The Resilience Library

The proxy is powered by a standalone Python library that you can use directly in your own applications.

Key Features

Async-native with asyncio and httpx
Intelligent key selection with tiered, model-aware locking
Deadline-driven requests with configurable global timeout
Automatic failover between keys on errors
OAuth support for Gemini CLI, Antigravity, Qwen, iFlow
Stateless deployment ready — load credentials from environment variables

Basic Usage

from rotator_library import RotatingClient

client = RotatingClient(
    api_keys={"gemini": ["key1", "key2"], "openai": ["key3"]},
    global_timeout=30,
    max_retries=2
)

async with client:
    response = await client.acompletion(
        model="gemini/gemini-2.5-flash",
        messages=[{"role": "user", "content": "Hello!"}]
    )

Library Documentation

See the Library README for complete documentation including:

All initialization parameters
Streaming support
Error handling and cooldown strategies
Provider plugin system
Credential prioritization

Interactive TUI

The proxy includes a powerful text-based UI for configuration and management.

TUI Features

🚀 Run Proxy — Start the server with saved settings
⚙️ Configure Settings — Host, port, API key, request logging
🔑 Manage Credentials — Add/edit API keys and OAuth credentials
📊 View Status — See configured providers and credential counts
🔧 Advanced Settings — Custom providers, model definitions, concurrency

Configuration Files

File	Contents
`.env`	All credentials and advanced settings
`launcher_config.json`	TUI-specific settings (host, port, logging)

Features

Core Capabilities

Universal OpenAI-compatible endpoint for all providers
Multi-provider support via LiteLLM fallback
Automatic key rotation and load balancing
Interactive TUI for easy configuration
Detailed request logging for debugging

🛡️ Resilience & High Availability

Global timeout with deadline-driven retries
Escalating cooldowns per model (10s → 30s → 60s → 120s)
Key-level lockouts for consistently failing keys
Stream error detection and graceful recovery
Batch embedding aggregation for improved throughput
Automatic daily resets for cooldowns and usage stats

🔑 Credential Management

Auto-discovery of API keys from environment variables
OAuth discovery from standard paths (~/.gemini/, ~/.qwen/, ~/.iflow/)
Duplicate detection warns when same account added multiple times
Credential prioritization — paid tier used before free tier
Stateless deployment — export OAuth to environment variables
Local-first storage — credentials isolated in oauth_creds/ directory

⚙️ Advanced Configuration

Model whitelists/blacklists with wildcard support
Per-provider concurrency limits (MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>)
Rotation modes — balanced (distribute load) or sequential (use until exhausted)
Priority multipliers — higher concurrency for paid credentials
Model quota groups — shared cooldowns for related models
Temperature override — prevent tool hallucination issues
Weighted random rotation — unpredictable selection patterns

🔌 Provider-Specific Features

Gemini CLI:

Zero-config Google Cloud project discovery
Internal API access with higher rate limits
Automatic fallback to preview models on rate limit
Paid vs free tier detection

Antigravity:

Gemini 3 Pro with thinkingLevel support
Claude Opus 4.5 (thinking mode)
Claude Sonnet 4.5 (thinking and non-thinking)
Thought signature caching for multi-turn conversations
Tool hallucination prevention

Qwen Code:

Dual auth (API key + OAuth Device Flow)
<think> tag parsing as reasoning_content
Tool schema cleaning

iFlow:

Dual auth (API key + OAuth Authorization Code)
Hybrid auth with separate API key fetch
Tool schema cleaning

NVIDIA NIM:

Dynamic model discovery
DeepSeek thinking support

📝 Logging & Debugging

Per-request file logging with --enable-request-logging
Unique request directories with full transaction details
Streaming chunk capture for debugging
Performance metadata (duration, tokens, model used)
Provider-specific logs for Qwen, iFlow, Antigravity

Advanced Configuration

Environment Variables Reference

Proxy Settings

Variable	Description	Default
`PROXY_API_KEY`	Authentication key for your proxy	Required
`OAUTH_REFRESH_INTERVAL`	Token refresh check interval (seconds)	`600`
`SKIP_OAUTH_INIT_CHECK`	Skip interactive OAuth setup on startup	`false`

Per-Provider Settings

Pattern	Description	Example
`<PROVIDER>_API_KEY_<N>`	API key for provider	`GEMINI_API_KEY_1`
`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`	Concurrent request limit	`MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3`
`ROTATION_MODE_<PROVIDER>`	`balanced` or `sequential`	`ROTATION_MODE_GEMINI=sequential`
`IGNORE_MODELS_<PROVIDER>`	Blacklist (comma-separated, supports `*`)	`IGNORE_MODELS_OPENAI=-preview`
`WHITELIST_MODELS_<PROVIDER>`	Whitelist (overrides blacklist)	`WHITELIST_MODELS_GEMINI=gemini-2.5-pro`

Advanced Features

Variable	Description
`ROTATION_TOLERANCE`	`0.0`=deterministic, `3.0`=weighted random (default)
`CONCURRENCY_MULTIPLIER_<PROVIDER>_PRIORITY_<N>`	Concurrency multiplier per priority tier
`QUOTA_GROUPS_<PROVIDER>_<GROUP>`	Models sharing quota limits
`OVERRIDE_TEMPERATURE_ZERO`	`remove` or `set` to prevent tool hallucination

Model Filtering (Whitelists & Blacklists)

Control which models are exposed through your proxy.

Blacklist Only

# Hide all preview models
IGNORE_MODELS_OPENAI="*-preview*"

Pure Whitelist Mode

# Block all, then allow specific models
IGNORE_MODELS_GEMINI="*"
WHITELIST_MODELS_GEMINI="gemini-2.5-pro,gemini-2.5-flash"

Exemption Mode

# Block preview models, but allow one specific preview
IGNORE_MODELS_OPENAI="*-preview*"
WHITELIST_MODELS_OPENAI="gpt-4o-2024-08-06-preview"

Logic order: Whitelist check → Blacklist check → Default allow

Concurrency & Rotation Settings

Concurrency Limits

# Allow 3 concurrent requests per OpenAI key
MAX_CONCURRENT_REQUESTS_PER_KEY_OPENAI=3

# Default is 1 (no concurrency)
MAX_CONCURRENT_REQUESTS_PER_KEY_GEMINI=1

Rotation Modes

# balanced (default): Distribute load evenly - best for per-minute rate limits
ROTATION_MODE_OPENAI=balanced

# sequential: Use until exhausted - best for daily/weekly quotas
ROTATION_MODE_GEMINI=sequential

Priority Multipliers

Paid credentials can handle more concurrent requests:

# Priority 1 (paid ultra): 10x concurrency
CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_1=10

# Priority 2 (standard paid): 3x
CONCURRENCY_MULTIPLIER_ANTIGRAVITY_PRIORITY_2=3

Model Quota Groups

Models sharing quota limits:

# Claude models share quota - when one hits limit, both cool down
QUOTA_GROUPS_ANTIGRAVITY_CLAUDE="claude-sonnet-4-5,claude-opus-4-5"

Timeout Configuration

Fine-grained control over HTTP timeouts:

TIMEOUT_CONNECT=30              # Connection establishment
TIMEOUT_WRITE=30                # Request body send
TIMEOUT_POOL=60                 # Connection pool acquisition
TIMEOUT_READ_STREAMING=180      # Between streaming chunks (3 min)
TIMEOUT_READ_NON_STREAMING=600  # Full response wait (10 min)

Recommendations:

Long thinking tasks: Increase TIMEOUT_READ_STREAMING to 300-360s
Unstable network: Increase TIMEOUT_CONNECT to 60s
Large outputs: Increase TIMEOUT_READ_NON_STREAMING to 900s+

OAuth Providers

Gemini CLI

Uses Google OAuth to access internal Gemini endpoints with higher rate limits.

Setup:

Run python -m rotator_library.credential_tool
Select "Add OAuth Credential" → "Gemini CLI"
Complete browser authentication
Credentials saved to oauth_creds/gemini_cli_oauth_1.json

Features:

Zero-config project discovery
Automatic free-tier project onboarding
Paid vs free tier detection
Smart fallback on rate limits

Environment Variables (for stateless deployment):

GEMINI_CLI_ACCESS_TOKEN="ya29.your-access-token"
GEMINI_CLI_REFRESH_TOKEN="1//your-refresh-token"
GEMINI_CLI_EXPIRY_DATE="1234567890000"
GEMINI_CLI_EMAIL="[email protected]"
GEMINI_CLI_PROJECT_ID="your-gcp-project-id"  # Optional

Antigravity (Gemini 3 + Claude Opus 4.5)

Access Google's internal Antigravity API for cutting-edge models.

Supported Models:

Gemini 3 Pro — with thinkingLevel support (low/high)
Claude Opus 4.5 — Anthropic's most powerful model (thinking mode only)
Claude Sonnet 4.5 — supports both thinking and non-thinking modes
Gemini 2.5 Pro/Flash

Setup:

Run python -m rotator_library.credential_tool
Select "Add OAuth Credential" → "Antigravity"
Complete browser authentication

Advanced Features:

Thought signature caching for multi-turn conversations
Tool hallucination prevention via parameter signature injection
Automatic thinking block sanitization for Claude
Credential prioritization (paid resets every 5 hours, free weekly)

Environment Variables:

ANTIGRAVITY_ACCESS_TOKEN="ya29.your-access-token"
ANTIGRAVITY_REFRESH_TOKEN="1//your-refresh-token"
ANTIGRAVITY_EXPIRY_DATE="1234567890000"
ANTIGRAVITY_EMAIL="[email protected]"

# Feature toggles
ANTIGRAVITY_ENABLE_SIGNATURE_CACHE=true
ANTIGRAVITY_GEMINI3_TOOL_FIX=true

Note: Gemini 3 models require a paid-tier Google Cloud project.

Qwen Code

Uses OAuth Device Flow for Qwen/Dashscope APIs.

Setup:

Run the credential tool
Select "Add OAuth Credential" → "Qwen Code"
Enter the code displayed in your browser
Or add API key directly: QWEN_CODE_API_KEY_1="your-key"

Features:

Dual auth (API key or OAuth)
<think> tag parsing as reasoning_content
Automatic tool schema cleaning
Custom models via QWEN_CODE_MODELS env var

iFlow

Uses OAuth Authorization Code flow with local callback server.

Setup:

Run the credential tool
Select "Add OAuth Credential" → "iFlow"
Complete browser authentication (callback on port 11451)
Or add API key directly: IFLOW_API_KEY_1="sk-your-key"

Features:

Dual auth (API key or OAuth)
Hybrid auth (OAuth token fetches separate API key)
Automatic tool schema cleaning
Custom models via IFLOW_MODELS env var

Stateless Deployment (Export to Environment Variables)

For platforms without file persistence (Railway, Render, Vercel):

Set up credentials locally:

python -m rotator_library.credential_tool
# Complete OAuth flows

Export to environment variables:

python -m rotator_library.credential_tool
# Select "Export [Provider] to .env"

Copy generated variables to your platform: The tool creates files like gemini_cli_credential_1.env containing all necessary variables.
Set SKIP_OAUTH_INIT_CHECK=true to skip interactive validation on startup.

OAuth Callback Port Configuration

Customize OAuth callback ports if defaults conflict:

Provider	Default Port	Environment Variable
Gemini CLI	8085	`GEMINI_CLI_OAUTH_PORT`
Antigravity	51121	`ANTIGRAVITY_OAUTH_PORT`
iFlow	11451	`IFLOW_OAUTH_PORT`

Deployment

Command-Line Arguments

python src/proxy_app/main.py [OPTIONS]

Options:
  --host TEXT                Host to bind (default: 0.0.0.0)
  --port INTEGER             Port to run on (default: 8000)
  --enable-request-logging   Enable detailed per-request logging
  --add-credential           Launch interactive credential setup tool

Examples:

# Run on custom port
python src/proxy_app/main.py --host 127.0.0.1 --port 9000

# Run with logging
python src/proxy_app/main.py --enable-request-logging

# Add credentials without starting proxy
python src/proxy_app/main.py --add-credential

Render / Railway / Vercel

See the Deployment Guide for complete instructions.

Quick Setup:

Fork the repository
Create a .env file with your credentials
Create a new Web Service pointing to your repo
Set build command: pip install -r requirements.txt
Set start command: uvicorn src.proxy_app.main:app --host 0.0.0.0 --port $PORT
Upload .env as a secret file

OAuth Credentials: Export OAuth credentials to environment variables using the credential tool, then add them to your platform's environment settings.

Custom VPS / Docker

Option 1: Authenticate locally, deploy credentials

Complete OAuth flows on your local machine
Export to environment variables
Deploy .env to your server

Option 2: SSH Port Forwarding

# Forward callback ports through SSH
ssh -L 51121:localhost:51121 -L 8085:localhost:8085 user@your-vps

# Then run credential tool on the VPS

Systemd Service:

[Unit]
Description=LLM API Key Proxy
After=network.target

[Service]
Type=simple
WorkingDirectory=/path/to/LLM-API-Key-Proxy
ExecStart=/path/to/python -m uvicorn src.proxy_app.main:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target

See VPS Deployment for complete guide.

Troubleshooting

Issue	Solution
`401 Unauthorized`	Verify `PROXY_API_KEY` matches your `Authorization: Bearer` header exactly
`500 Internal Server Error`	Check provider key validity; enable `--enable-request-logging` for details
All keys on cooldown	All keys failed recently; check `logs/detailed_logs/` for upstream errors
Model not found	Verify format is `provider/model_name` (e.g., `gemini/gemini-2.5-flash`)
OAuth callback failed	Ensure callback port (8085, 51121, 11451) isn't blocked by firewall
Streaming hangs	Increase `TIMEOUT_READ_STREAMING`; check provider status

Detailed Logs:

When --enable-request-logging is enabled, check logs/detailed_logs/ for:

request.json — Exact request payload
final_response.json — Complete response or error
streaming_chunks.jsonl — All SSE chunks received
metadata.json — Performance metrics

Documentation

Document	Description
Technical Documentation	Architecture, internals, provider implementations
Library README	Using the resilience library directly
Deployment Guide	Hosting on Render, Railway, VPS
.env.example	Complete environment variable reference

License

This project is dual-licensed:

Proxy Application (src/proxy_app/) — MIT License
Resilience Library (src/rotator_library/) — LGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 487 Commits
.github		.github
src		src
.env.example		.env.example
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
Deployment guide.md		Deployment guide.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Mirrowel/LLM-API-Key-Proxy

Folders and files

Latest commit

History

Repository files navigation

Universal LLM API Proxy & Resilience Library

Why Use This?

Quick Start

Windows

macOS / Linux

From Source

Connecting to the Proxy

Model Format: provider/model_name

Usage Examples

API Endpoints

Managing Credentials

Using the TUI

Using the Command Line

Credential Types

The .env File

The Resilience Library

Key Features

Basic Usage

Library Documentation

Interactive TUI

TUI Features

Configuration Files

Features

Core Capabilities

Advanced Configuration

Proxy Settings

Per-Provider Settings

Advanced Features

Blacklist Only

Pure Whitelist Mode

Exemption Mode

Concurrency Limits

Rotation Modes

Priority Multipliers

Model Quota Groups

OAuth Providers

Deployment

Troubleshooting

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 55

Uh oh!

Contributors 5

Languages

Model Format: `provider/model_name`

The `.env` File