feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL by somedev7 · Pull Request #900 · rohitg00/agentmemory

somedev7 · 2026-06-11T00:19:49Z

Closes #899

What

Adds an optional AGENTMEMORY_COMPRESS_MODEL env var that routes compress()-side work (per-observation compression, graph extraction, query expansion) to a dedicated — typically cheaper — model, while summarize() callers (session summaries, consolidation synthesis, reflection, crystallize) stay on the main model.

Why

Per-observation compression dominates background LLM volume (one call per tool use under AGENTMEMORY_AUTO_COMPRESS=true; ~87% of calls on a measured active day — table in the linked issue), but its quality bar is the loosest of the four LLM pipelines. With a single shared model setting, the only way to cut compression cost is to downgrade summaries and lessons too — exactly the outputs that get injected into future sessions.

How

ProviderConfig gains optional compressModel; detectProvider() reads AGENTMEMORY_COMPRESS_MODEL and attaches it for the openai / minimax / anthropic / gemini / openrouter branches.
Each HTTP provider's call() accepts an optional model override; compress() passes this.compressModel, summarize() does not. describeImage() stays on the main model.
createBaseProvider() threads the value into the four provider constructors. FALLBACK_PROVIDERS chains deliberately do NOT inherit it — model names are provider-specific, mirroring the Bug: FALLBACK_PROVIDERS passes the primary provider's model to fallback providers → cross-provider failover always 404s and trips the circuit breaker (v0.9.24) #778 reasoning (comment added at the exclusion site).
AgentMemoryConfig.compressionModel now reports compressModel ?? model instead of always mirroring provider.model.
Docs: .env.example entry + README "Cost-aware model selection" and Environment Variables sections.

No MCP tools, REST endpoints, or version fields touched, so the AGENTS.md consistency checklists don't apply.

Field test

Ran on a real instance (OpenAI-compatible provider) for ~20 hours with the main model on Qwen/Qwen3-30B-A3B-Instruct-2507 and AGENTMEMORY_COMPRESS_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B: ~500 observation compressions, average qualityScore 93.7 (vs 96.6 on the main model), retry rate 1.2%, zero LLM errors or circuit-breaker trips; session summaries kept flowing through the main model. Compression cost dropped ~40% on input / 20% on output.

How to verify

npm run build — clean.
npm test — 1421 tests pass (1415 baseline + 6 new in test/compress-model.test.ts: per-provider assertions that compress() sends the override and summarize() sends the main model, plus loadConfig() pickup/default tests).
Manual: set AGENTMEMORY_COMPRESS_MODEL to a cheaper model of your provider, restart, and watch model in outbound compression requests (or provider dashboard) while session summaries keep using the main model.

Summary by CodeRabbit

New Features
- Optional environment variable to route compression-related memory work to a separate, potentially cheaper model while summaries/reflection remain on the main model; supported across providers and ignored for noop/agent-sdk and fallback-provider flows.
Documentation
- Updated README and .env example to describe the compression-model setting and provider compatibility.
Tests
- Added tests verifying compress() uses the compression model (or falls back) while summarize() uses the main model.

vercel · 2026-06-11T00:19:53Z

@Slavik47 is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-06-11T00:20:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c1cb98b3-c947-470e-9e30-c706ed688774

📥 Commits

Reviewing files that changed from the base of the PR and between 82f83f4 and e9ab1d2.

📒 Files selected for processing (2)

.env.example
test/compress-model.test.ts

✅ Files skipped from review due to trivial changes (1)

.env.example

🚧 Files skipped from review as they are similar to previous changes (1)

test/compress-model.test.ts

📝 Walkthrough

Walkthrough

This PR adds AGENTMEMORY_COMPRESS_MODEL and wires an optional per-provider compressModel so compress() calls can use a cheaper provider-valid model while summarize() continues using the primary model; changes include types, provider constructors/calls, config propagation, docs, and tests.

Changes

Compression Model Override

Layer / File(s)	Summary
Type contract and configuration detection `src/types.ts`, `src/config.ts`	`ProviderConfig` adds optional `compressModel`; `detectProvider()` propagates `compressModel` in provider branches; `loadConfig()` sets `compressionModel` to `provider.compressModel ?? provider.model`.
Provider implementation: model override pattern `src/providers/openai.ts`, `src/providers/anthropic.ts`, `src/providers/minimax.ts`, `src/providers/openrouter.ts`	Providers accept and store `compressModel`, `compress()` calls `call(..., modelOverride)` with the override, `summarize()` uses `this.model`; `call()` chooses `modelOverride ?? this.model`.
Configuration propagation and provider instantiation `src/config.ts`, `src/providers/index.ts`	Spreads `compressModel` through detectProvider branches and passes `config.compressModel` into provider constructors; updates fallback-provider docs to exclude `AGENTMEMORY_COMPRESS_MODEL`.
User-facing documentation `.env.example`, `README.md`	Adds README and `.env` example entries for `AGENTMEMORY_COMPRESS_MODEL`, documenting compress()-only scope and provider-valid model-name constraint.
Test coverage: provider routing and config propagation `test/compress-model.test.ts`	Vitest suite mocks fetch/SDK fetch to assert `compress()` uses `compressModel` (or falls back) while `summarize()` uses main model; tests `loadConfig()` propagation and defaults.

Sequence Diagram

sequenceDiagram
  participant Config
  participant OpenAIProvider
  participant OpenAIAPI as OpenAI API
  Config->>OpenAIProvider: new OpenAIProvider(apiKey, model, maxTokens, baseURL, compressModel)
  OpenAIProvider->>OpenAIProvider: this.compressModel = compressModel
  rect rgba(100, 150, 255, 0.5)
  Note over OpenAIProvider,OpenAIAPI: compress() flow
  OpenAIProvider->>OpenAIProvider: compress(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt, this.compressModel)
  OpenAIProvider->>OpenAIAPI: POST {model: compressModel ?? model}
  end
  rect rgba(150, 200, 100, 0.5)
  Note over OpenAIProvider,OpenAIAPI: summarize() flow
  OpenAIProvider->>OpenAIProvider: summarize(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt)
  OpenAIProvider->>OpenAIAPI: POST {model: model}
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled code beneath the moon,
Two models hum a thriftier tune,
Compress hops light, summaries stand tall,
Small hops save carrots for us all,
Hoppity review — give it a boon ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: introducing an optional AGENTMEMORY_COMPRESS_MODEL environment variable to route compress() work to a cheaper model.
Linked Issues check	✅ Passed	All core objectives from `#899` are met: single provider-agnostic env var, compress()/summarize() routing, ignored for agent-sdk/noop/FALLBACK_PROVIDERS, and compressionModel configurability across providers.
Out of Scope Changes check	✅ Passed	All changes align with issue `#899` scope: documentation, provider routing, configuration, and tests for compress/summarize model selection—no unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

.env.example (1)
48-56: 💤 Low value

Optional: unify spelling to match repository convention.

Line 50 uses "Summarisation" (British English), but the README and codebase consistently use "summarization" (American English). For consistency, consider changing to match the predominant style.
Spelling alignment
 # Optional cheaper model for compress() work only: per-observation compression,
 # graph extraction, query expansion — the bulk of background LLM volume (one
-# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarisation,
+# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarization,
 # consolidation synthesis, and reflection stay on the main model above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.example around lines 48 - 56, Change the British spelling
"Summarisation" to American "Summarization" in the .env example comment so it
matches the repository convention (ensure the line describing
AGENTMEMORY_COMPRESS_MODEL and surrounding explanatory text uses "summarization"
consistently).
test/compress-model.test.ts (2)
67-98: ⚡ Quick win

Consider adding fallback tests for OpenRouter and Minimax to match the OpenAI pattern.

Lines 51-65 establish a pattern of testing both WITH and WITHOUT compressModel for OpenAIProvider, verifying that the fallback to main-model works correctly. OpenRouterProvider (lines 67-82) and MinimaxProvider (lines 84-98) only test the WITH case. The layer description states "with and without compressModel set," which implies both cases should be covered for each provider.

While the fallback logic (modelOverride ?? this.model) is identical across providers based on the code snippets, testing each provider's fallback behavior guards against future implementation divergence and provides complete coverage.
📋 Example fallback test for OpenRouterProvider
+  it("OpenRouterProvider without compressModel uses the main model for both", async () => {
+    const fetched = mockFetch(openAiStyleResponse);
+    const provider = new OpenRouterProvider(
+      "test-key",
+      "main-model",
+      4096,
+      "https://openrouter.ai/api/v1/chat/completions",
+    );
+
+    await provider.compress("sys", "user");
+    await provider.summarize("sys", "user");
+
+    expect(fetched.sentModel(0)).toBe("main-model");
+    expect(fetched.sentModel(1)).toBe("main-model");
+  });
Apply a similar pattern for MinimaxProvider.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/compress-model.test.ts` around lines 67 - 98, Add tests that mirror the
OpenAIProvider fallback pattern for OpenRouterProvider and MinimaxProvider by
adding "without compressModel set" cases: create each provider instance without
supplying a compressModel, call provider.compress("sys","user") and
provider.summarize("sys","user"), and assert using the same mockFetch helper
that fetched.sentModel(0) equals the main model and fetched.sentModel(1) equals
the main model; reference the OpenRouterProvider and MinimaxProvider
constructors and the provider.compress/provider.summarize calls and use
fetched.sentModel to verify the fallback behavior.
101-146: Add compress/summarize routing tests for Anthropic and Gemini

test/compress-model.test.ts covers OpenAIProvider, OpenRouterProvider (OpenRouter base URL), and MinimaxProvider for compressModel vs main model behavior.

There are no equivalent assertions for AnthropicProvider or for the gemini path (which is implemented via OpenRouterProvider when using the generativelanguage.googleapis.com base URL).

test/fallback-model-resolution.test.ts verifies env-driven default model selection (including mocked anthropic/gemini) but does not test compress()/summarize() routing with AGENTMEMORY_COMPRESS_MODEL.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/compress-model.test.ts` around lines 101 - 146, Add tests in
test/compress-model.test.ts that mirror the OpenAI/OpenRouter/Minimax cases for
AnthropicProvider and the Gemini path (OpenRouterProvider with base URL
generativelanguage.googleapis.com): set
process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to
select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the
provider instance (or the provider factory used in existing tests), then assert
that provider.compressModel is "cheap-model" and that
compressionModel/config.compressionModel is "cheap-model"; also add the
counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert
provider.compressModel is undefined and compressionModel falls back to the main
model. Ensure you reference AnthropicProvider and OpenRouterProvider
(generativelanguage.googleapis.com) and reuse loadConfig() as in the existing
tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.env.example:
- Around line 48-56: Change the British spelling "Summarisation" to American
"Summarization" in the .env example comment so it matches the repository
convention (ensure the line describing AGENTMEMORY_COMPRESS_MODEL and
surrounding explanatory text uses "summarization" consistently).

In `@test/compress-model.test.ts`:
- Around line 67-98: Add tests that mirror the OpenAIProvider fallback pattern
for OpenRouterProvider and MinimaxProvider by adding "without compressModel set"
cases: create each provider instance without supplying a compressModel, call
provider.compress("sys","user") and provider.summarize("sys","user"), and assert
using the same mockFetch helper that fetched.sentModel(0) equals the main model
and fetched.sentModel(1) equals the main model; reference the OpenRouterProvider
and MinimaxProvider constructors and the provider.compress/provider.summarize
calls and use fetched.sentModel to verify the fallback behavior.
- Around line 101-146: Add tests in test/compress-model.test.ts that mirror the
OpenAI/OpenRouter/Minimax cases for AnthropicProvider and the Gemini path
(OpenRouterProvider with base URL generativelanguage.googleapis.com): set
process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to
select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the
provider instance (or the provider factory used in existing tests), then assert
that provider.compressModel is "cheap-model" and that
compressionModel/config.compressionModel is "cheap-model"; also add the
counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert
provider.compressModel is undefined and compressionModel falls back to the main
model. Ensure you reference AnthropicProvider and OpenRouterProvider
(generativelanguage.googleapis.com) and reuse loadConfig() as in the existing
tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d1c0780-a3d9-483a-8913-106af793f7df

📥 Commits

Reviewing files that changed from the base of the PR and between f3dc7f8 and 86adfa1.

📒 Files selected for processing (10)

.env.example
README.md
src/config.ts
src/providers/anthropic.ts
src/providers/index.ts
src/providers/minimax.ts
src/providers/openai.ts
src/providers/openrouter.ts
src/types.ts
test/compress-model.test.ts

…Y_COMPRESS_MODEL Per-observation compression dominates background LLM volume (one call per tool use under AGENTMEMORY_AUTO_COMPRESS=true), but the model is shared with summarization, consolidation, and reflection — so the only way to cut compression cost today is to downgrade everything. AGENTMEMORY_COMPRESS_MODEL routes compress()-side work (observation compression, graph extraction, query expansion) to a dedicated model while summarize() callers stay on the main one. Applies to the openai, anthropic, gemini, openrouter, and minimax providers; ignored by agent-sdk/noop and by FALLBACK_PROVIDERS chains, where model names are provider-specific (same reasoning as rohitg00#778). Also wires AgentMemoryConfig.compressionModel to reflect the override instead of always mirroring provider.model. Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>

…ck for all providers Address CodeRabbit review: add without-compressModel fallback cases for OpenRouter and Minimax, routing tests for AnthropicProvider (real Response objects — the SDK consumes the body stream) and for the gemini path via OpenRouterProvider with the Google base URL. Also align .env.example spelling with the repository's American English convention. Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

somedev7 force-pushed the feat/compress-model branch from 86adfa1 to 82f83f4 Compare June 11, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900

feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900
somedev7 wants to merge 2 commits into
rohitg00:mainfrom
somedev7:feat/compress-model

somedev7 commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

somedev7 commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Field test

How to verify

Summary by CodeRabbit

Uh oh!

vercel Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

somedev7 commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading