Skip to content
#

inference-gateway

Here are 26 public repositories matching this topic...

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

  • Updated Jun 26, 2026
  • Rust

An intelligent gateway for Claude APIs that dynamically routes requests to the most cost-efficient model, caches responses, and escalates based on confidence signals — reducing LLM spend without sacrificing quality.

  • Updated May 6, 2026
  • Python

An enterprise-grade, configuration-driven MLOps pipeline for credit risk underwriting. Built with XGBoost, strict data validation, mlFlow, and CI/CD automation. Dockerized inference deployed via render

  • Updated Jun 24, 2026
  • Python

Improve this page

Add a description, image, and links to the inference-gateway topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-gateway topic, visit your repo's landing page and select "manage topics."

Learn more