Status: Draft (v1)
cllama is an open standard and reference architecture for a context-aware, bidirectional Large Language Model (LLM) governance proxy. It is designed to run as a shared pod-level service managed by Clawdapus, serving multiple autonomous agents (Claws) within the same pod.
This document defines the contract between Clawdapus (the orchestrator) and a cllama proxy (the policy enforcer). Any container image that adheres to this specification can be used as a CLLAMA proxy.
- Bidirectional Interception:
cllamaintercepts outbound prompts (agent → provider) and inbound responses (provider → agent). - Multi-Agent Identity: A single proxy serves multiple agents. Identity is established via unique per-agent Bearer Tokens supplied in the
Authorizationheader. - Intelligent Authorization: The proxy is context-aware. It uses the bearer token to load the specific agent's identity, active rules (
enforce), and available tools to make dynamic allow/deny/amend decisions. - Credential Starvation: The proxy acts as a secure firewall. Agent containers are provisioned with unique dummy tokens. The proxy holds the real provider API keys, preventing agents from bypassing governance.
- Conversational Upgradability: While not strictly required for v1, the proxy architecture is designed to eventually support natural language configuration (updating rules dynamically via conversation).
A cllama sidecar MUST expose a canonical ingress surface matrix for runner traffic.
Minimum required surfaces:
| Surface | Path | Payload family | Default use |
|---|---|---|---|
| OpenAI Chat Completions | POST /v1/chat/completions |
OpenAI-compatible chat/completions | All non-Anthropic providers unless an explicit exception is documented |
| Anthropic Messages | POST /v1/messages |
Anthropic Messages | Anthropic-family providers and explicit Anthropic-wire exceptions |
- Listen Port: The proxy MUST listen on
0.0.0.0:8080. - Base URL Replacement: Clawdapus configures the agent's runner (e.g., OpenClaw, Claude Code) to use
http://cllama-<type>:8080/v1as its LLM base URL (first proxy in chain when chaining is enabled). The runner then targets one of the canonical ingress paths beneath that base URL. - Provider Identity vs Transport: Operator-facing model refs keep provider identity (
google/gemini-*,anthropic/*, etc.). The proxy ingress surface is a transport contract selected by infrastructure; runners MUST NOT invent synthetic provider prefixes such ascllama/google, and the shared ingress contract rejects them when compiling cllama-facing config. - Vendor-Native Extensions: Additional vendor-native ingress surfaces MAY exist, but only as explicit, documented exceptions when a concrete runner cannot target the canonical surfaces. They are not the default contract.
- Implementation Scope (Phase 4): The wire protocol supports chained proxies, but runtime enforcement currently allows only one proxy type per pod. Declaring multiple proxy types fails fast until Phase 5 chain execution is implemented.
Clawdapus injects the pod's operational context into the cllama container at startup. Because a single proxy serves multiple agents, context is provided through a combination of global environment variables and a Shared Context Mount.
| Variable | Description |
|---|---|
CLAW_POD |
The name of the pod (e.g., crypto-ops). |
PROVIDER_API_KEY_* |
The real provider keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) supplied securely by the operator. |
CLAW_CONTEXT_ROOT |
The path to the shared context directory (defaults to /claw/context). |
Clawdapus bind-mounts a shared directory into the proxy (at CLAW_CONTEXT_ROOT) containing subdirectories for every agent in the pod. The directory name matches the agent's ID.
/claw/context/
├── crypto-crusher-0/
│ ├── AGENTS.md # Compiled contract (includes, enforce, guide)
│ ├── CLAWDAPUS.md # Infrastructure map
│ ├── metadata.json # Identity, handles, and active policy modules
│ └── context-blocks.json # Optional operator-authored context blocks
├── crypto-crusher-1/
│ └── ...
When an agent makes a request to the proxy, it MUST include a unique Bearer Token in the Authorization header:
Authorization: Bearer <agent-id>:<secure-secret>
Agents SHOULD also include X-Claw-Consumer-Session-Epoch when the runner can provide a process-stable restart identifier. The value is opaque to cllama; it must stay stable for the lifetime of the consumer process and change when that process restarts. cllama uses this optional header to decide whether channel-context cursors represent the current consumer session or a previous one. Missing or blank values preserve legacy cursor behavior.
The proxy SHOULD execute the following pipeline:
- Identity Resolution: The proxy uses the
<agent-id>portion (e.g.,crypto-crusher-0) to resolve the agent's context from the corresponding subdirectory inCLAW_CONTEXT_ROOT. - Authentication: The proxy MUST validate the
<secure-secret>before processing the request. - Model Validation: Ensure the requested
modelis within theCLAW_ALLOWED_MODELSlist (parsed frommetadata.json).
- Context Aggregation: The proxy loads the agent-specific compiled context from
CLAW_CONTEXT_ROOTand MAY inject infrastructure-owned runtime context such as context blocks, feeds, memory recall, time context, and channel deltas. - Tool Scoping: If the agent's request contains
tools, the proxy evaluates the request against the compiled tool manifest for that agent. The reference implementation only exposes tools declared for that agent; policy-plane implementations MAY further filter or deny tools. - Prompt Decoration (Pre-Prompting): Policy-plane implementations MAY modify the outbound
messagesarray, injecting specific rules, priorities, or warnings based on the compiled context. The passthrough reference does not perform policy prompt decoration. - Policy Blocking: If the outbound prompt violates a loaded policy module, a policy-plane implementation MAY short-circuit the request and return an error or a mock response. The passthrough reference does not perform policy blocking.
- Forced Model Routing, Budgets & Compute Metering: Even if the agent requests a specific model (e.g.,
gpt-4o), the proxy MAY seamlessly rewrite the request to use a different, operator-approved model (e.g.,claude-3-haiku-20240307) or provider. The reference implementation meters usage, records cost telemetry, and enforces compiled per-agent budget/request caps before provider dispatch. On cap breach it returns429 Too Many Requestsand emits aninterventionsuch asbudget_exceededorrate_limited. If the budget ledger is unavailable, the reference implementation logsbudget_check_unavailableand defaults to fail-open unlessCLLAMA_BUDGET_FAIL_MODE=closedis configured.
The proxy strips the dummy token, attaches the real PROVIDER_API_KEY, and forwards the decorated request to the upstream LLM provider.
- Response Evaluation: Policy-plane implementations MAY evaluate the provider's response against the compiled contract and active policy modules. The passthrough reference forwards provider responses without policy evaluation.
- Amendment: If the response contains restricted information (e.g., PII leakage) or violates the tone/instructions of the contract, a policy-plane implementation MAY rewrite the content. The passthrough reference does not amend responses.
- Drift Scoring: Behavioral drift is an organization-specific policy metric. A policy-plane implementation MAY emit drift or score telemetry, but the passthrough reference does not define or emit a built-in
drift_score.
The (potentially amended) response is returned to the agent container.
The cllama proxy MUST emit structured JSON logs to stdout. Clawdapus collects these logs for the claw audit command.
Logs must contain the following fields:
ts: ISO-8601 UTC timestamp.claw_id: The calling agent.type: one ofrequest,response,error,intervention,feed_fetch,feed_injection,context_block,memory_op,channel_context_op, orprovider_pool.intervention: If the proxy modified routing, mediation, or other request handling, it describes why. In the reference logger this field is present on every event and isnullwhen no intervention occurred.
Event-specific fields may also be present:
status_code,latency_ms,tokens_in,tokens_out,cost_usdfor request/response/error eventsfeed_name,feed_urlfor feed fetch eventsfeed_name,source,feed_status, and byte-budget fields for feed injection eventscontext_block_id,context_block_kind,context_block_status,context_block_cadence,context_block_placement, andcontext_block_reasonfor context block eventskind,channels,retained,returned,omitted, byte counts, and source/status fields for channel context operationsprovider,key_id,action,reason,cooldown_untilfor provider-pool eventsmemory_service,memory_op,memory_status,memory_blocks,memory_bytes,memory_removedfor memory telemetry events
cllama writes a durable JSONL session history at the proxy boundary. This is an infrastructure-owned record of every completed inference transaction — written by the proxy, not by agents. It is distinct from /claw/memory, which is runner-owned and agent-writable.
| Variable | Default | Description |
|---|---|---|
CLAW_SESSION_HISTORY_DIR |
/claw/session-history |
Host-side base directory for per-agent JSONL history files. When set, cllama writes one file per agent at <dir>/<agent-id>/history.jsonl. |
When orchestrated by Clawdapus, claw up automatically bind-mounts .claw-session-history/ (relative to the pod file) into the cllama container at /claw/session-history whenever cllama is enabled for any service in the pod.
/claw/session-history/
├── crypto-crusher-0/
│ └── history.jsonl
├── crypto-crusher-1/
│ └── history.jsonl
One JSONL file per agent. Each line is a single entry. Entries are appended on every successful upstream completion (HTTP 2xx only). Non-2xx responses are not recorded in session history; they appear only in structured audit logs (see §5).
Each line is a JSON object with the following fields:
| Field | Type | Description |
|---|---|---|
version |
integer | Schema version. Currently 1. |
id |
string | Stable source-event ID for replay and deduplication. |
ts |
string | RFC3339 timestamp of when the response was received. |
claw_id |
string | Agent ID that issued the request. |
path |
string | Request path (e.g., /v1/chat/completions). |
requested_model |
string | Model string as sent by the agent. |
effective_provider |
string | Provider name after routing (e.g., anthropic). |
effective_model |
string | Model name forwarded to the upstream provider. |
status_code |
integer | HTTP status code returned by the upstream. |
stream |
boolean | Whether the response was streamed (SSE). |
request_original |
object | The request body as received from the agent, before any proxy modification. |
request_effective |
object | The request body as forwarded to the upstream provider, after credential swap and any model rewrite. |
response |
object | See response payload below. |
usage |
object | Token counts extracted from the response: prompt_tokens (integer), completion_tokens (integer). |
usage.reported_cost_usd |
number (float) | Cost in USD reported by the provider; omitted when not available |
Response payload (response field):
| Field | Type | Description |
|---|---|---|
format |
string | "json" for standard JSON responses; "sse" for Server-Sent Events streams. |
json |
object | Present when format is "json". The parsed response body. |
text |
string | Present when format is "sse". The raw event stream text. |
Phase 1 is retention only. cllama writes the history; no read API exists. Agents cannot query their own history programmatically. No prompt injection, no retrieval, no summarization. The JSONL files are accessible to operators via the host filesystem mount for offline analysis, auditing, and future tooling.
Clawdapus provides a reference image: ghcr.io/mostlydev/cllama.
The passthrough reference:
- Adheres to the v1 ingress surface matrix and Listen Port.
- Validates the environment (
CLAW_POD,CLAW_CONTEXT_ROOT, provider credentials), bearer-token identity resolution, and mounts. - Acts as a pure, transparent proxy (no decoration, no amendment).
- Emits structured logs of all traffic and budget/rate interventions.
This image is used for testing network integration and serves as the boilerplate for operators to build proprietary cllama policy engines (e.g., incorporating advanced DLP, RAG-based context injection, or conversational configuration).
Tools like ClawRouter act as specialized instances of a cllama proxy focused on forced model routing, provider availability, rate limiting, and compute metering. The reference passthrough establishes the identity, per-agent budget/rate caps, routing, and telemetry contract; specialized engines can layer richer provider selection and organization-specific cost policy on that contract.