Skip to content

feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900

Open
somedev7 wants to merge 2 commits into
rohitg00:mainfrom
somedev7:feat/compress-model
Open

feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900
somedev7 wants to merge 2 commits into
rohitg00:mainfrom
somedev7:feat/compress-model

Conversation

@somedev7

@somedev7 somedev7 commented Jun 11, 2026

Copy link
Copy Markdown

Closes #899

What

Adds an optional AGENTMEMORY_COMPRESS_MODEL env var that routes compress()-side work (per-observation compression, graph extraction, query expansion) to a dedicated — typically cheaper — model, while summarize() callers (session summaries, consolidation synthesis, reflection, crystallize) stay on the main model.

Why

Per-observation compression dominates background LLM volume (one call per tool use under AGENTMEMORY_AUTO_COMPRESS=true; ~87% of calls on a measured active day — table in the linked issue), but its quality bar is the loosest of the four LLM pipelines. With a single shared model setting, the only way to cut compression cost is to downgrade summaries and lessons too — exactly the outputs that get injected into future sessions.

How

  • ProviderConfig gains optional compressModel; detectProvider() reads AGENTMEMORY_COMPRESS_MODEL and attaches it for the openai / minimax / anthropic / gemini / openrouter branches.
  • Each HTTP provider's call() accepts an optional model override; compress() passes this.compressModel, summarize() does not. describeImage() stays on the main model.
  • createBaseProvider() threads the value into the four provider constructors. FALLBACK_PROVIDERS chains deliberately do NOT inherit it — model names are provider-specific, mirroring the Bug: FALLBACK_PROVIDERS passes the primary provider's model to fallback providers → cross-provider failover always 404s and trips the circuit breaker (v0.9.24) #778 reasoning (comment added at the exclusion site).
  • AgentMemoryConfig.compressionModel now reports compressModel ?? model instead of always mirroring provider.model.
  • Docs: .env.example entry + README "Cost-aware model selection" and Environment Variables sections.

No MCP tools, REST endpoints, or version fields touched, so the AGENTS.md consistency checklists don't apply.

Field test

Ran on a real instance (OpenAI-compatible provider) for ~20 hours with the main model on Qwen/Qwen3-30B-A3B-Instruct-2507 and AGENTMEMORY_COMPRESS_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B: ~500 observation compressions, average qualityScore 93.7 (vs 96.6 on the main model), retry rate 1.2%, zero LLM errors or circuit-breaker trips; session summaries kept flowing through the main model. Compression cost dropped ~40% on input / 20% on output.

How to verify

  • npm run build — clean.
  • npm test — 1421 tests pass (1415 baseline + 6 new in test/compress-model.test.ts: per-provider assertions that compress() sends the override and summarize() sends the main model, plus loadConfig() pickup/default tests).
  • Manual: set AGENTMEMORY_COMPRESS_MODEL to a cheaper model of your provider, restart, and watch model in outbound compression requests (or provider dashboard) while session summaries keep using the main model.

Summary by CodeRabbit

  • New Features

    • Optional environment variable to route compression-related memory work to a separate, potentially cheaper model while summaries/reflection remain on the main model; supported across providers and ignored for noop/agent-sdk and fallback-provider flows.
  • Documentation

    • Updated README and .env example to describe the compression-model setting and provider compatibility.
  • Tests

    • Added tests verifying compress() uses the compression model (or falls back) while summarize() uses the main model.

@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

@Slavik47 is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c1cb98b3-c947-470e-9e30-c706ed688774

📥 Commits

Reviewing files that changed from the base of the PR and between 82f83f4 and e9ab1d2.

📒 Files selected for processing (2)
  • .env.example
  • test/compress-model.test.ts
✅ Files skipped from review due to trivial changes (1)
  • .env.example
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/compress-model.test.ts

📝 Walkthrough

Walkthrough

This PR adds AGENTMEMORY_COMPRESS_MODEL and wires an optional per-provider compressModel so compress() calls can use a cheaper provider-valid model while summarize() continues using the primary model; changes include types, provider constructors/calls, config propagation, docs, and tests.

Changes

Compression Model Override

Layer / File(s) Summary
Type contract and configuration detection
src/types.ts, src/config.ts
ProviderConfig adds optional compressModel; detectProvider() propagates compressModel in provider branches; loadConfig() sets compressionModel to provider.compressModel ?? provider.model.
Provider implementation: model override pattern
src/providers/openai.ts, src/providers/anthropic.ts, src/providers/minimax.ts, src/providers/openrouter.ts
Providers accept and store compressModel, compress() calls call(..., modelOverride) with the override, summarize() uses this.model; call() chooses modelOverride ?? this.model.
Configuration propagation and provider instantiation
src/config.ts, src/providers/index.ts
Spreads compressModel through detectProvider branches and passes config.compressModel into provider constructors; updates fallback-provider docs to exclude AGENTMEMORY_COMPRESS_MODEL.
User-facing documentation
.env.example, README.md
Adds README and .env example entries for AGENTMEMORY_COMPRESS_MODEL, documenting compress()-only scope and provider-valid model-name constraint.
Test coverage: provider routing and config propagation
test/compress-model.test.ts
Vitest suite mocks fetch/SDK fetch to assert compress() uses compressModel (or falls back) while summarize() uses main model; tests loadConfig() propagation and defaults.

Sequence Diagram

sequenceDiagram
  participant Config
  participant OpenAIProvider
  participant OpenAIAPI as OpenAI API
  Config->>OpenAIProvider: new OpenAIProvider(apiKey, model, maxTokens, baseURL, compressModel)
  OpenAIProvider->>OpenAIProvider: this.compressModel = compressModel
  rect rgba(100, 150, 255, 0.5)
  Note over OpenAIProvider,OpenAIAPI: compress() flow
  OpenAIProvider->>OpenAIProvider: compress(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt, this.compressModel)
  OpenAIProvider->>OpenAIAPI: POST {model: compressModel ?? model}
  end
  rect rgba(150, 200, 100, 0.5)
  Note over OpenAIProvider,OpenAIAPI: summarize() flow
  OpenAIProvider->>OpenAIProvider: summarize(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIAPI: POST {model: model}
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled code beneath the moon,
Two models hum a thriftier tune,
Compress hops light, summaries stand tall,
Small hops save carrots for us all,
Hoppity review — give it a boon ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: introducing an optional AGENTMEMORY_COMPRESS_MODEL environment variable to route compress() work to a cheaper model.
Linked Issues check ✅ Passed All core objectives from #899 are met: single provider-agnostic env var, compress()/summarize() routing, ignored for agent-sdk/noop/FALLBACK_PROVIDERS, and compressionModel configurability across providers.
Out of Scope Changes check ✅ Passed All changes align with issue #899 scope: documentation, provider routing, configuration, and tests for compress/summarize model selection—no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
.env.example (1)

48-56: 💤 Low value

Optional: unify spelling to match repository convention.

Line 50 uses "Summarisation" (British English), but the README and codebase consistently use "summarization" (American English). For consistency, consider changing to match the predominant style.

Spelling alignment
 # Optional cheaper model for compress() work only: per-observation compression,
 # graph extraction, query expansion — the bulk of background LLM volume (one
-# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarisation,
+# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarization,
 # consolidation synthesis, and reflection stay on the main model above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.example around lines 48 - 56, Change the British spelling
"Summarisation" to American "Summarization" in the .env example comment so it
matches the repository convention (ensure the line describing
AGENTMEMORY_COMPRESS_MODEL and surrounding explanatory text uses "summarization"
consistently).
test/compress-model.test.ts (2)

67-98: ⚡ Quick win

Consider adding fallback tests for OpenRouter and Minimax to match the OpenAI pattern.

Lines 51-65 establish a pattern of testing both WITH and WITHOUT compressModel for OpenAIProvider, verifying that the fallback to main-model works correctly. OpenRouterProvider (lines 67-82) and MinimaxProvider (lines 84-98) only test the WITH case. The layer description states "with and without compressModel set," which implies both cases should be covered for each provider.

While the fallback logic (modelOverride ?? this.model) is identical across providers based on the code snippets, testing each provider's fallback behavior guards against future implementation divergence and provides complete coverage.

📋 Example fallback test for OpenRouterProvider
+  it("OpenRouterProvider without compressModel uses the main model for both", async () => {
+    const fetched = mockFetch(openAiStyleResponse);
+    const provider = new OpenRouterProvider(
+      "test-key",
+      "main-model",
+      4096,
+      "https://openrouter.ai/api/v1/chat/completions",
+    );
+
+    await provider.compress("sys", "user");
+    await provider.summarize("sys", "user");
+
+    expect(fetched.sentModel(0)).toBe("main-model");
+    expect(fetched.sentModel(1)).toBe("main-model");
+  });

Apply a similar pattern for MinimaxProvider.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/compress-model.test.ts` around lines 67 - 98, Add tests that mirror the
OpenAIProvider fallback pattern for OpenRouterProvider and MinimaxProvider by
adding "without compressModel set" cases: create each provider instance without
supplying a compressModel, call provider.compress("sys","user") and
provider.summarize("sys","user"), and assert using the same mockFetch helper
that fetched.sentModel(0) equals the main model and fetched.sentModel(1) equals
the main model; reference the OpenRouterProvider and MinimaxProvider
constructors and the provider.compress/provider.summarize calls and use
fetched.sentModel to verify the fallback behavior.

101-146: Add compress/summarize routing tests for Anthropic and Gemini

  • test/compress-model.test.ts covers OpenAIProvider, OpenRouterProvider (OpenRouter base URL), and MinimaxProvider for compressModel vs main model behavior.
  • There are no equivalent assertions for AnthropicProvider or for the gemini path (which is implemented via OpenRouterProvider when using the generativelanguage.googleapis.com base URL).
  • test/fallback-model-resolution.test.ts verifies env-driven default model selection (including mocked anthropic/gemini) but does not test compress()/summarize() routing with AGENTMEMORY_COMPRESS_MODEL.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/compress-model.test.ts` around lines 101 - 146, Add tests in
test/compress-model.test.ts that mirror the OpenAI/OpenRouter/Minimax cases for
AnthropicProvider and the Gemini path (OpenRouterProvider with base URL
generativelanguage.googleapis.com): set
process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to
select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the
provider instance (or the provider factory used in existing tests), then assert
that provider.compressModel is "cheap-model" and that
compressionModel/config.compressionModel is "cheap-model"; also add the
counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert
provider.compressModel is undefined and compressionModel falls back to the main
model. Ensure you reference AnthropicProvider and OpenRouterProvider
(generativelanguage.googleapis.com) and reuse loadConfig() as in the existing
tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.env.example:
- Around line 48-56: Change the British spelling "Summarisation" to American
"Summarization" in the .env example comment so it matches the repository
convention (ensure the line describing AGENTMEMORY_COMPRESS_MODEL and
surrounding explanatory text uses "summarization" consistently).

In `@test/compress-model.test.ts`:
- Around line 67-98: Add tests that mirror the OpenAIProvider fallback pattern
for OpenRouterProvider and MinimaxProvider by adding "without compressModel set"
cases: create each provider instance without supplying a compressModel, call
provider.compress("sys","user") and provider.summarize("sys","user"), and assert
using the same mockFetch helper that fetched.sentModel(0) equals the main model
and fetched.sentModel(1) equals the main model; reference the OpenRouterProvider
and MinimaxProvider constructors and the provider.compress/provider.summarize
calls and use fetched.sentModel to verify the fallback behavior.
- Around line 101-146: Add tests in test/compress-model.test.ts that mirror the
OpenAI/OpenRouter/Minimax cases for AnthropicProvider and the Gemini path
(OpenRouterProvider with base URL generativelanguage.googleapis.com): set
process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to
select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the
provider instance (or the provider factory used in existing tests), then assert
that provider.compressModel is "cheap-model" and that
compressionModel/config.compressionModel is "cheap-model"; also add the
counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert
provider.compressModel is undefined and compressionModel falls back to the main
model. Ensure you reference AnthropicProvider and OpenRouterProvider
(generativelanguage.googleapis.com) and reuse loadConfig() as in the existing
tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d1c0780-a3d9-483a-8913-106af793f7df

📥 Commits

Reviewing files that changed from the base of the PR and between f3dc7f8 and 86adfa1.

📒 Files selected for processing (10)
  • .env.example
  • README.md
  • src/config.ts
  • src/providers/anthropic.ts
  • src/providers/index.ts
  • src/providers/minimax.ts
  • src/providers/openai.ts
  • src/providers/openrouter.ts
  • src/types.ts
  • test/compress-model.test.ts

…Y_COMPRESS_MODEL

Per-observation compression dominates background LLM volume (one call
per tool use under AGENTMEMORY_AUTO_COMPRESS=true), but the model is
shared with summarization, consolidation, and reflection — so the only
way to cut compression cost today is to downgrade everything.

AGENTMEMORY_COMPRESS_MODEL routes compress()-side work (observation
compression, graph extraction, query expansion) to a dedicated model
while summarize() callers stay on the main one. Applies to the openai,
anthropic, gemini, openrouter, and minimax providers; ignored by
agent-sdk/noop and by FALLBACK_PROVIDERS chains, where model names are
provider-specific (same reasoning as rohitg00#778).

Also wires AgentMemoryConfig.compressionModel to reflect the override
instead of always mirroring provider.model.

Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>
@somedev7 somedev7 force-pushed the feat/compress-model branch from 86adfa1 to 82f83f4 Compare June 11, 2026 00:37
…ck for all providers

Address CodeRabbit review: add without-compressModel fallback cases for
OpenRouter and Minimax, routing tests for AnthropicProvider (real
Response objects — the SDK consumes the body stream) and for the gemini
path via OpenRouterProvider with the Google base URL. Also align
.env.example spelling with the repository's American English convention.

Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: allow a cheaper model for compression than for summarization

1 participant