Version: 1.41.0
Vector Mcp is a production-grade Agent and Model Context Protocol (MCP) server designed to interface directly with Integrate RAG into AI Agents via MCP Server. Supports multiple Vector database technologies..
- Consolidated Action-Routed MCP Tools: Minimizes token overhead and eliminates tool bloat in LLM contexts by grouping methods into optimized, togglable tool modules.
- Enterprise-Grade Security: Comprehensive support for Eunomia policies, OIDC token delegation, and granular execution context tracking.
- Integrated Graph Agent: Built-in Pydantic AI agent supporting the Agent Control Protocol (ACP) and standard Web interfaces (AG-UI).
- Native Telemetry & Tracing: Out-of-the-box OpenTelemetry exports and native Langfuse tracing.
This agent wraps the Integrate RAG into AI Agents via MCP Server. Supports multiple Vector database technologies. API. You can interact with it programmatically or via its integrated execution entrypoints.
Detailed instructions on how to use the underlying API wrappers, extended schema bindings, and developer SDK references are maintained in docs/index.md.
This server utilizes dynamic Action-Routed tools to optimize token overhead and maximize IDE compatibility.
Auto-generated from the live MCP server — do not edit by hand.
| MCP Tool | Toggle Env Var | Description |
|---|---|---|
vector_collection_management |
COLLECTION_MANAGEMENTTOOL |
Manage collection management operations. |
vector_search |
SEARCHTOOL |
Manage search operations. |
2 action-routed tools (default MCP_TOOL_MODE=condensed). Each is enabled unless its toggle is set false; set MCP_TOOL_MODE=verbose (or both) for the 1:1 per-operation surface. Auto-generated — do not edit.
Detailed tool schemas, parameter shapes, and validation constraints are preserved in docs/mcp.md.
This MCP server supports dynamic toolset selection and visibility filtering at runtime. This allows you to restrict the set of exposed tools in order to prevent blowing up the LLM's context window.
You can configure tool filtering via multiple input channels:
- CLI Arguments: Pass
--toolsor--toolsets(or their disabled counterparts--disabled-toolsand--disabled-toolsets) during startup. - Environment Variables: Define standard environment variables:
MCP_ENABLED_TOOLS/MCP_DISABLED_TOOLSMCP_ENABLED_TAGS/MCP_DISABLED_TAGS
- HTTP SSE Request Headers: Pass custom headers during transport initialization:
x-mcp-enabled-tools/x-mcp-disabled-toolsx-mcp-enabled-tags/x-mcp-disabled-tags
- HTTP SSE Request Query Parameters: Append query parameters directly to your transport connection URL:
?tools=tool1,tool2?tags=tag1
When query strings or parameters are supplied, an LLM-free Knowledge Graph resolution layer (using DynamicToolOrchestrator) matches query intents against known tool tags, names, or descriptions, with safe fallback and automated 24-hour background cache refreshing.
Install the slim
[mcp]extra. All examples below installvector-mcp[mcp]— the MCP-server extra that pulls only the FastMCP / FastAPI tooling (agent-utilities[mcp]). It deliberately excludes the heavy agent runtime (the epistemic-graph engine,pydantic-ai,dspy,llama-index,tree-sitter), souvx/container installs are dramatically smaller and faster. Use the full[agent]extra only when you need the integrated Pydantic AI agent (see Installation).
Configure your IDE's mcp.json to launch the MCP server via uvx:
{
"mcpServers": {
"vector-mcp": {
"command": "uvx",
"args": [
"--from",
"vector-mcp[mcp]",
"vector-mcp"
],
"env": {
"VECTOR_URL": "your_vector_url_here",
"EMBEDDING_MODEL_ID": "your_embedding_model_id_here",
"CHUNK_SIZE": "your_chunk_size_here",
"VECTOR_API_KEY": "your_vector_api_key_here"
}
}
}
}Configure your client's mcp.json to launch the Streamable-HTTP server via uvx with explicit host and port definition:
{
"mcpServers": {
"vector-mcp": {
"command": "uvx",
"args": [
"--from",
"vector-mcp[mcp]",
"vector-mcp"
],
"env": {
"TRANSPORT": "streamable-http",
"HOST": "0.0.0.0",
"PORT": "8000",
"VECTOR_URL": "your_vector_url_here",
"EMBEDDING_MODEL_ID": "your_embedding_model_id_here",
"CHUNK_SIZE": "your_chunk_size_here",
"VECTOR_API_KEY": "your_vector_api_key_here"
}
}
}
}Alternatively, connect to a pre-deployed remote or local Streamable-HTTP instance:
{
"mcpServers": {
"vector-mcp": {
"url": "http://localhost:8000/vector-mcp/mcp"
}
}
}Deploying the Streamable-HTTP server via Docker:
docker run -d \
--name vector-mcp-mcp \
-p 8000:8000 \
-e TRANSPORT=streamable-http \
-e PORT=8000 \
-e VECTOR_URL="your_value" \
-e EMBEDDING_MODEL_ID="your_value" \
-e CHUNK_SIZE="your_value" \
-e VECTOR_API_KEY="your_value" \
knucklessg1/vector-mcp:mcpThe
:mcptag is the slim MCP-server image (built fromdocker/Dockerfile --target mcp, installingvector-mcp[mcp]). The default:latesttag is the full agent image (--target agent,vector-mcp[agent]) which also bundles the Pydantic AI agent and the epistemic-graph engine — use it when you runvector-agent(the agent), not just the MCP server. See Container images.
vector-mcp can also run as a local container (Docker / Podman / uv) or be
consumed from a remote deployment. The
Deployment guide has full, copy-paste
mcp_config.json for all four transports — stdio, streamable-http,
local container / uv, and remote URL:
- Local container / uv — launch the server from
mcp_config.jsonviauvx,docker run, orpodman run, or point at a local streamable-http container byurl. - Remote URL — connect to a server deployed behind Caddy at
http://vector-mcp.arpa/mcpusing the"url"key.
| Variable | Example | Description |
|---|---|---|
HOST |
0.0.0.0 |
|
PORT |
8000 |
|
TRANSPORT |
stdio |
options: stdio, streamable-http, sse |
ENABLE_OTEL |
True |
|
OTEL_EXPORTER_OTLP_ENDPOINT |
http://localhost:8080/api/public/otel |
|
OTEL_EXPORTER_OTLP_PUBLIC_KEY |
pk-... |
|
OTEL_EXPORTER_OTLP_SECRET_KEY |
sk-... |
|
OTEL_EXPORTER_OTLP_PROTOCOL |
http/protobuf |
|
EUNOMIA_TYPE |
none |
options: none, embedded, remote |
EUNOMIA_POLICY_FILE |
mcp_policies.json |
|
EUNOMIA_REMOTE_URL |
http://eunomia-server:8000 |
|
VECTOR_URL |
http://localhost:8000 |
|
EMBEDDING_MODEL_ID |
text-embedding-nomic-embed-text-v2-moe |
|
CHUNK_SIZE |
512 |
|
VECTOR_API_KEY |
your_vector_api_key_here |
|
COLLECTION_MANAGEMENTTOOL |
True |
|
SEARCHTOOL |
True |
| Variable | Example | Description |
|---|---|---|
MCP_TOOL_MODE |
condensed |
Tool surface: condensed |
MCP_ENABLED_TOOLS |
— | Comma-separated tool allow-list |
MCP_DISABLED_TOOLS |
— | Comma-separated tool deny-list |
MCP_ENABLED_TAGS |
— | Comma-separated tag allow-list |
MCP_DISABLED_TAGS |
— | Comma-separated tag deny-list |
MCP_CLIENT_AUTH |
— | Outbound MCP auth (oidc-client-credentials for fleet calls) |
OIDC_CLIENT_ID |
— | OIDC client id (service-account auth) |
OIDC_CLIENT_SECRET |
— | OIDC client secret (service-account auth) |
DEBUG |
False |
Verbose logging |
PYTHONUNBUFFERED |
1 |
Unbuffered stdout (recommended in containers) |
MCP_URL |
http://localhost:8000/mcp |
URL of the MCP server the agent connects to |
PROVIDER |
openai |
LLM provider for the agent |
MODEL_ID |
gpt-4o |
Model id for the agent |
ENABLE_WEB_UI |
True |
Serve the AG-UI web interface |
17 package + 14 inherited variable(s). Auto-generated from .env.example + the shared agent-utilities set — do not edit.
Every variable the server reads, grouped by purpose.
| Variable | Description | Default |
|---|---|---|
VECTOR_URL |
Base URL of the vector database / embedding endpoint | http://localhost:8000 |
VECTOR_API_KEY |
API key for the vector database / embedding provider | — |
EMBEDDING_MODEL_ID |
Embedding model id used for indexing & search | text-embedding-nomic-embed-text-v2-moe |
CHUNK_SIZE |
Document chunk size for ingestion | 512 |
| Variable | Description | Default |
|---|---|---|
TRANSPORT |
stdio, streamable-http, or sse |
stdio |
HOST |
Bind host (HTTP transports) | 0.0.0.0 |
PORT |
Bind port (HTTP transports) | 8000 |
MCP_TOOL_MODE |
Tool surface: condensed, verbose, or both |
condensed |
MCP_ENABLED_TOOLS / MCP_DISABLED_TOOLS |
Comma-separated tool allow/deny list | — |
MCP_ENABLED_TAGS / MCP_DISABLED_TAGS |
Comma-separated tag allow/deny list | — |
PYTHONUNBUFFERED |
Unbuffered stdout (recommended in containers) | 1 |
Each action-routed tool can be disabled individually via its toggle env var (set to false).
The full list is in the Available MCP Tools table above.
| Variable | Description | Default |
|---|---|---|
COLLECTION_MANAGEMENTTOOL |
Enable the collection-management tool | True |
SEARCHTOOL |
Enable the search tool | True |
| Variable | Description | Default |
|---|---|---|
ENABLE_OTEL |
Enable OpenTelemetry export | True |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP collector endpoint | — |
OTEL_EXPORTER_OTLP_PUBLIC_KEY / OTEL_EXPORTER_OTLP_SECRET_KEY |
OTLP auth keys | — |
OTEL_EXPORTER_OTLP_PROTOCOL |
OTLP protocol (e.g. http/protobuf) |
— |
EUNOMIA_TYPE |
Authorization mode: none, embedded, remote |
none |
EUNOMIA_POLICY_FILE |
Embedded policy file | mcp_policies.json |
EUNOMIA_REMOTE_URL |
Remote Eunomia server URL | — |
| Variable | Description | Default |
|---|---|---|
MCP_URL |
URL of the MCP server the agent connects to | http://localhost:8000/mcp |
PROVIDER |
LLM provider (e.g. openai) |
openai |
MODEL_ID |
Model id (e.g. gpt-4o) |
gpt-4o |
ENABLE_WEB_UI |
Serve the AG-UI web interface | True |
See .env.example for a copy-paste starting point.
This repository features a fully integrated Pydantic AI Graph Agent. It communicates over the Agent Control Protocol (ACP) and interacts seamlessly with the Agent Web UI (AG-UI) and Terminal interface.
To start the interactive command-line agent:
# Set credentials
export VECTOR_URL="your_value"
export EMBEDDING_MODEL_ID="your_value"
export CHUNK_SIZE="your_value"
export VECTOR_API_KEY="your_value"
# Run the agent server
vector-agent --provider openai --model-id gpt-4oThe following docker/agent.compose.yml configures the Agent, Web UI, and Terminal Interface together:
version: '3.8'
services:
vector-mcp-mcp:
image: knucklessg1/vector-mcp:mcp
container_name: vector-mcp-mcp
hostname: vector-mcp-mcp
restart: always
env_file:
- ../.env
environment:
- PYTHONUNBUFFERED=1
- HOST=0.0.0.0
- PORT=8000
- TRANSPORT=streamable-http
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
vector-mcp-agent:
image: knucklessg1/vector-mcp:latest
container_name: vector-mcp-agent
hostname: vector-mcp-agent
restart: always
depends_on:
- vector-mcp-mcp
env_file:
- ../.env
command: [ "vector-agent" ]
environment:
- PYTHONUNBUFFERED=1
- HOST=0.0.0.0
- PORT=9023
- MCP_URL=http://vector-mcp-mcp:8000/mcp
- PROVIDER=${PROVIDER:-openai}
- MODEL_ID=${MODEL_ID:-gpt-4o}
- ENABLE_WEB_UI=True
- ENABLE_OTEL=True
ports:
- "9023:9023"
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9023/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
Detailed graph node architecture explanations, custom skill configurations, and agentic trace guides are available in docs/agent.md.
Built directly upon the enterprise-ready agent-utilities core, standard security parameters are fully supported:
- Eunomia Policies: Fine-grained, policy-driven tool authorization. Supports
none, localembedded(mcp_policies.json), or centralizedremotemodes. - OIDC Token Delegation: Compliant with RFC 8693 token exchange for flowing authenticating user credentials from Web UI / ACP → Agent → MCP.
- Scoped Credentials: Execution context runs restricted to the specific caller identity.
| Feature | Functionality | Enablement |
|---|---|---|
| Tool Guard | Sensitivity inspection with human-in-the-loop validation | Enabled by default |
| Prompt Injection Defense | Input scanning, repetition monitoring, and recursive loop blocks | Enabled by default |
| Context Safety Guard | Stuck-loop detectors and contextual overflow preemptive alerts | Enabled by default |
Pick the extra that matches what you want to run:
| Extra | Installs | Use when |
|---|---|---|
vector-mcp[mcp] |
Slim MCP server only (agent-utilities[mcp] — FastMCP/FastAPI) |
You only run the MCP server (smallest install / image) |
vector-mcp[agent] |
Full agent runtime (agent-utilities[agent,logfire] — Pydantic AI + the epistemic-graph engine) |
You run the integrated agent |
vector-mcp[all] |
Everything (mcp + all vector backends + agent) |
Development / both surfaces |
# MCP server only (recommended for tool hosting — slim deps)
uv pip install "vector-mcp[mcp]"
# Full agent runtime (Pydantic AI + epistemic-graph engine)
uv pip install "vector-mcp[agent]"
# Everything (development)
uv pip install "vector-mcp[all]" # or: python -m pip install "vector-mcp[all]"One multi-stage docker/Dockerfile builds two right-sized images, selected by --target:
| Image tag | Build target | Contents | Entrypoint |
|---|---|---|---|
knucklessg1/vector-mcp:mcp |
--target mcp |
vector-mcp[mcp] — slim, no engine/pydantic-ai/dspy/llama-index/tree-sitter |
vector-mcp |
knucklessg1/vector-mcp:latest |
--target agent (default) |
vector-mcp[agent] — full agent runtime + epistemic-graph engine |
vector-agent |
docker build --target mcp -t knucklessg1/vector-mcp:mcp docker/ # slim MCP server
docker build --target agent -t knucklessg1/vector-mcp:latest docker/ # full agentdocker/mcp.compose.yml runs the slim :mcp server; docker/agent.compose.yml runs the
agent (:latest) with a co-located :mcp sidecar.
The full agent ([agent] / :latest) embeds the epistemic-graph engine (pulled in
transitively via agent-utilities[agent]). For production — or to share one knowledge graph
across multiple agents — run epistemic-graph as its own database container and point the
agent at it instead of embedding it. Deployment recipes (single-node + Raft HA), connection
config, and the full database architecture (with diagrams) are documented in the
epistemic-graph deployment guide.
The slim [mcp] server does not require the database.
Contributions are welcome! Please ensure code quality by executing local checks before submitting pull requests:
- Format code using
ruff format . - Lint code using
ruff check . - Validate type-safety with
mypy . - Execute test suites using
pytest
This package can be provisioned for you — skill-guided — by the agent-os-genesis
universal skill (its single-package deploy mode): it picks your install method, seeds
secrets to OpenBao/Vault (or .env), trusts your enterprise CA, registers the MCP
server, and verifies it — the same machinery that stands up the whole Agent OS, narrowed
to just this package. Ask your agent to "deploy vector-mcp with agent-os-genesis".
| Install mode | Command |
|---|---|
| Bare-metal, prod (PyPI) | uvx vector-mcp · or uv tool install vector-mcp |
| Bare-metal, dev (editable) | uv pip install -e ".[all]" · or pip install -e ".[all]" |
| Container, prod | deploy knucklessg1/vector-mcp:latest via docker-compose / swarm / podman / podman-compose / kubernetes |
| Container, dev (editable) | deploy docker/compose.dev.yml (source-mounted at /src; edits live on restart) |
Secrets are read-existing + seeded via vault_sync — you are only prompted for what's missing.