#

inference-gateway

Here are 26 public repositories matching this topic...

lightseekorg / smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

chat mcp routing gemini openai claude llm anthropic vllm sglang anthropic-api inference-gateway tokenspeed responses-api tensorrtllm trtllm lightseek

Updated Jun 26, 2026
Rust

api7 / aisix

Open-source AI gateway for LLMs & AI agents, built in Rust. One OpenAI-compatible API for OpenAI, Anthropic, Gemini, Bedrock & more — routing, guardrails, caching, rate limits, observability.

Updated Jun 26, 2026
Rust

inference-gateway / adk

An Agent Development Kit (ADK) allowing for seamless creation of A2A-compatible agents written in Go.

enterprise opensource adk openstandards a2a llm inference-gateway a2a-protocol a2a-server

Updated Jun 24, 2026
Go

google-calendar-agent

inference-gateway / google-calendar-agent

A2A agent server enabling Google Calendar scheduling, retrieval, and automation

go golang google calendar google-calendar google-calendar-api a2a inference-gateway a2a-protocol

Updated Jun 25, 2026
Go

inference-gateway / rust-adk

An Agent Development Kit (ADK) allowing for seamless creation of A2A-compatible agents written in Rust.

rust opensource adk openstandards a2a inference-gateway a2a-protocol

Updated Jun 24, 2026
Rust

inference-gateway / adl-cli

A command-line tool to scaffold and manage enterprise-ready AI Agents powered by the A2A (Agent-to-Agent) protocol

docker ai docker-compose metrics utils production-ready scaffolding agents observability tls-support containerized a2a vendor-agnostic inference-gateway

Updated Jun 25, 2026
Go

inference-gateway / cli

A Git-first CLI coding agent that turns ideas, issues, and tasks into real code changes.

cli coding coder telegrambot coding-tools llms computer-use inference-gateway

Updated Jun 27, 2026
Go

inference-gateway / sdk

An SDK written in Go for the Inference Gateway.

sdk sdk-go inference-gateway

Updated Jun 24, 2026
Go

hec-ovi / openweight-inference-api

ROCm-first OpenAI-compatible inference gateway for open-weight reasoning models. Single active profile (gpt-oss-20b, deepseek-r1-distill, qwen3-4b), host-mounted weights, fail-closed contract. FastAPI + vLLM.

amd self-hosted inference-server rocm fastapi openai-api llm-serving vllm qwen deepseek inference-gateway openai-compatible responses-api gpt-oss

Updated Apr 26, 2026
Python

sunilp303 / claude-cost-gateway

An intelligent gateway for Claude APIs that dynamically routes requests to the most cost-efficient model, caches responses, and escalates based on confidence signals — reducing LLM spend without sacrificing quality.

api-proxy observability claude cost-optimization generative-ai llmops prompt-caching claude-api ai-gateway agentic-ai llm-gateway llm-routing inference-gateway semantic-caching llm-infrastructure

Updated May 6, 2026
Python

inference-gateway / rust-sdk

An SDK written in Rust for the Inference Gateway

sdk rust-sdk inference-gateway

Updated Jun 25, 2026
Rust

ElliotOne / nl-ai-cost-engineering-local-inference

Deterministic local-first inference gateway that controls cost, caching, and routing for predictable AI execution.

caching csharp dotnet budgeting ai-systems cost-engineering llm local-inference ollama inference-gateway

Updated Apr 11, 2026
C#

inference-gateway / skills

Curated catalog of Agent Skills for the Inference Gateway ecosystem

cli ai skills ai-agents inference-gateway

Updated Jun 26, 2026
JavaScript

developertogo / velo-sentinel

Production-grade Java 25 Virtual Thread inference gateway bridging NVIDIA Triton → Dynamo with Earliest Deadline First (EDF) priority queuing, adaptive batching, and async shadow validation.

redis distributed-systems grpc priority-queues load-balancing model-serving triton-inference-server virtual-threads inference-gateway semantic-caching nvidia-dynamo disaggregated-serving

Updated Jun 13, 2026
Java

ihimanshu29 / credit-risk-mlops-pipeline

An enterprise-grade, configuration-driven MLOps pipeline for credit risk underwriting. Built with XGBoost, strict data validation, mlFlow, and CI/CD automation. Dockerized inference deployed via render

Updated Jun 24, 2026
Python

inference-gateway / docs

Extensive documentation of the inference-gateway.

documentation gateway llms inference-gateway

Updated Jun 24, 2026
TypeScript

inference-gateway / typescript-sdk

An SDK written in Typescript for the Inference Gateway.

sdk typescript-sdk inference-gateway

Updated Jun 25, 2026
TypeScript

inference-gateway / typescript-adk

An Agent Development Kit (ADK) allowing for seamless creation of A2A-compatible agents written in TypeScript

typescript adk a2a inference-gateway a2a-protocol

Updated Jun 22, 2026
TypeScript

mostlydev / cllama

The blood-brain barrier for autonomous agents. A context-aware LLM governance proxy that enforces credential starvation — identity-verified, provider-routed, cost-tracked, and audit-logged.

ai-agents inference-api llm llm-inference llm-proxy inference-gateway

Updated Jun 27, 2026
Go

inference-gateway / operator

This project provides a Kubernetes Operator for managing the lifecycle of the inference-gateway and its related components. It simplifies deployment, configuration, and scaling of the gateway within Kubernetes clusters, enabling seamless integration of inference workflows.

kubernetes operator kubernetes-operator vendor-agnostic llm llm-inference inference-gateway

Updated Jun 23, 2026
Go

Improve this page

Add a description, image, and links to the inference-gateway topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-gateway topic, visit your repo's landing page and select "manage topics."