feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900
feat(providers): optional cheaper model for compress() via AGENTMEMORY_COMPRESS_MODEL#900somedev7 wants to merge 2 commits into
Conversation
|
@Slavik47 is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds AGENTMEMORY_COMPRESS_MODEL and wires an optional per-provider compressModel so compress() calls can use a cheaper provider-valid model while summarize() continues using the primary model; changes include types, provider constructors/calls, config propagation, docs, and tests. ChangesCompression Model Override
Sequence DiagramsequenceDiagram
participant Config
participant OpenAIProvider
participant OpenAIAPI as OpenAI API
Config->>OpenAIProvider: new OpenAIProvider(apiKey, model, maxTokens, baseURL, compressModel)
OpenAIProvider->>OpenAIProvider: this.compressModel = compressModel
rect rgba(100, 150, 255, 0.5)
Note over OpenAIProvider,OpenAIAPI: compress() flow
OpenAIProvider->>OpenAIProvider: compress(systemPrompt, userPrompt)
OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt, this.compressModel)
OpenAIProvider->>OpenAIAPI: POST {model: compressModel ?? model}
end
rect rgba(150, 200, 100, 0.5)
Note over OpenAIProvider,OpenAIAPI: summarize() flow
OpenAIProvider->>OpenAIProvider: summarize(systemPrompt, userPrompt)
OpenAIProvider->>OpenAIProvider: call(systemPrompt, userPrompt)
OpenAIProvider->>OpenAIAPI: POST {model: model}
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
.env.example (1)
48-56: 💤 Low valueOptional: unify spelling to match repository convention.
Line 50 uses "Summarisation" (British English), but the README and codebase consistently use "summarization" (American English). For consistency, consider changing to match the predominant style.
Spelling alignment
# Optional cheaper model for compress() work only: per-observation compression, # graph extraction, query expansion — the bulk of background LLM volume (one -# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarisation, +# call per tool use under AGENTMEMORY_AUTO_COMPRESS=true). Summarization, # consolidation synthesis, and reflection stay on the main model above.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.env.example around lines 48 - 56, Change the British spelling "Summarisation" to American "Summarization" in the .env example comment so it matches the repository convention (ensure the line describing AGENTMEMORY_COMPRESS_MODEL and surrounding explanatory text uses "summarization" consistently).test/compress-model.test.ts (2)
67-98: ⚡ Quick winConsider adding fallback tests for OpenRouter and Minimax to match the OpenAI pattern.
Lines 51-65 establish a pattern of testing both WITH and WITHOUT
compressModelfor OpenAIProvider, verifying that the fallback tomain-modelworks correctly. OpenRouterProvider (lines 67-82) and MinimaxProvider (lines 84-98) only test the WITH case. The layer description states "with and without compressModel set," which implies both cases should be covered for each provider.While the fallback logic (
modelOverride ?? this.model) is identical across providers based on the code snippets, testing each provider's fallback behavior guards against future implementation divergence and provides complete coverage.📋 Example fallback test for OpenRouterProvider
+ it("OpenRouterProvider without compressModel uses the main model for both", async () => { + const fetched = mockFetch(openAiStyleResponse); + const provider = new OpenRouterProvider( + "test-key", + "main-model", + 4096, + "https://openrouter.ai/api/v1/chat/completions", + ); + + await provider.compress("sys", "user"); + await provider.summarize("sys", "user"); + + expect(fetched.sentModel(0)).toBe("main-model"); + expect(fetched.sentModel(1)).toBe("main-model"); + });Apply a similar pattern for MinimaxProvider.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/compress-model.test.ts` around lines 67 - 98, Add tests that mirror the OpenAIProvider fallback pattern for OpenRouterProvider and MinimaxProvider by adding "without compressModel set" cases: create each provider instance without supplying a compressModel, call provider.compress("sys","user") and provider.summarize("sys","user"), and assert using the same mockFetch helper that fetched.sentModel(0) equals the main model and fetched.sentModel(1) equals the main model; reference the OpenRouterProvider and MinimaxProvider constructors and the provider.compress/provider.summarize calls and use fetched.sentModel to verify the fallback behavior.
101-146: Add compress/summarize routing tests for Anthropic and Gemini
test/compress-model.test.tscoversOpenAIProvider,OpenRouterProvider(OpenRouter base URL), andMinimaxProviderforcompressModelvs main model behavior.- There are no equivalent assertions for
AnthropicProvideror for thegeminipath (which is implemented viaOpenRouterProviderwhen using thegenerativelanguage.googleapis.combase URL).test/fallback-model-resolution.test.tsverifies env-driven default model selection (including mockedanthropic/gemini) but does not testcompress()/summarize()routing withAGENTMEMORY_COMPRESS_MODEL.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/compress-model.test.ts` around lines 101 - 146, Add tests in test/compress-model.test.ts that mirror the OpenAI/OpenRouter/Minimax cases for AnthropicProvider and the Gemini path (OpenRouterProvider with base URL generativelanguage.googleapis.com): set process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the provider instance (or the provider factory used in existing tests), then assert that provider.compressModel is "cheap-model" and that compressionModel/config.compressionModel is "cheap-model"; also add the counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert provider.compressModel is undefined and compressionModel falls back to the main model. Ensure you reference AnthropicProvider and OpenRouterProvider (generativelanguage.googleapis.com) and reuse loadConfig() as in the existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.env.example:
- Around line 48-56: Change the British spelling "Summarisation" to American
"Summarization" in the .env example comment so it matches the repository
convention (ensure the line describing AGENTMEMORY_COMPRESS_MODEL and
surrounding explanatory text uses "summarization" consistently).
In `@test/compress-model.test.ts`:
- Around line 67-98: Add tests that mirror the OpenAIProvider fallback pattern
for OpenRouterProvider and MinimaxProvider by adding "without compressModel set"
cases: create each provider instance without supplying a compressModel, call
provider.compress("sys","user") and provider.summarize("sys","user"), and assert
using the same mockFetch helper that fetched.sentModel(0) equals the main model
and fetched.sentModel(1) equals the main model; reference the OpenRouterProvider
and MinimaxProvider constructors and the provider.compress/provider.summarize
calls and use fetched.sentModel to verify the fallback behavior.
- Around line 101-146: Add tests in test/compress-model.test.ts that mirror the
OpenAI/OpenRouter/Minimax cases for AnthropicProvider and the Gemini path
(OpenRouterProvider with base URL generativelanguage.googleapis.com): set
process.env["AGENTMEMORY_COMPRESS_MODEL"]="cheap-model" and appropriate envs to
select Anthropic or the Gemini OpenRouter route, call loadConfig() to obtain the
provider instance (or the provider factory used in existing tests), then assert
that provider.compressModel is "cheap-model" and that
compressionModel/config.compressionModel is "cheap-model"; also add the
counterpart test deleting AGENTMEMORY_COMPRESS_MODEL to assert
provider.compressModel is undefined and compressionModel falls back to the main
model. Ensure you reference AnthropicProvider and OpenRouterProvider
(generativelanguage.googleapis.com) and reuse loadConfig() as in the existing
tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d1c0780-a3d9-483a-8913-106af793f7df
📒 Files selected for processing (10)
.env.exampleREADME.mdsrc/config.tssrc/providers/anthropic.tssrc/providers/index.tssrc/providers/minimax.tssrc/providers/openai.tssrc/providers/openrouter.tssrc/types.tstest/compress-model.test.ts
…Y_COMPRESS_MODEL Per-observation compression dominates background LLM volume (one call per tool use under AGENTMEMORY_AUTO_COMPRESS=true), but the model is shared with summarization, consolidation, and reflection — so the only way to cut compression cost today is to downgrade everything. AGENTMEMORY_COMPRESS_MODEL routes compress()-side work (observation compression, graph extraction, query expansion) to a dedicated model while summarize() callers stay on the main one. Applies to the openai, anthropic, gemini, openrouter, and minimax providers; ignored by agent-sdk/noop and by FALLBACK_PROVIDERS chains, where model names are provider-specific (same reasoning as rohitg00#778). Also wires AgentMemoryConfig.compressionModel to reflect the override instead of always mirroring provider.model. Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>
86adfa1 to
82f83f4
Compare
…ck for all providers Address CodeRabbit review: add without-compressModel fallback cases for OpenRouter and Minimax, routing tests for AnthropicProvider (real Response objects — the SDK consumes the body stream) and for the gemini path via OpenRouterProvider with the Google base URL. Also align .env.example spelling with the repository's American English convention. Signed-off-by: somedev7 <58051362+somedev7@users.noreply.github.com>
Closes #899
What
Adds an optional
AGENTMEMORY_COMPRESS_MODELenv var that routescompress()-side work (per-observation compression, graph extraction, query expansion) to a dedicated — typically cheaper — model, whilesummarize()callers (session summaries, consolidation synthesis, reflection, crystallize) stay on the main model.Why
Per-observation compression dominates background LLM volume (one call per tool use under
AGENTMEMORY_AUTO_COMPRESS=true; ~87% of calls on a measured active day — table in the linked issue), but its quality bar is the loosest of the four LLM pipelines. With a single shared model setting, the only way to cut compression cost is to downgrade summaries and lessons too — exactly the outputs that get injected into future sessions.How
ProviderConfiggains optionalcompressModel;detectProvider()readsAGENTMEMORY_COMPRESS_MODELand attaches it for the openai / minimax / anthropic / gemini / openrouter branches.call()accepts an optional model override;compress()passesthis.compressModel,summarize()does not.describeImage()stays on the main model.createBaseProvider()threads the value into the four provider constructors.FALLBACK_PROVIDERSchains deliberately do NOT inherit it — model names are provider-specific, mirroring the Bug: FALLBACK_PROVIDERS passes the primary provider's model to fallback providers → cross-provider failover always 404s and trips the circuit breaker (v0.9.24) #778 reasoning (comment added at the exclusion site).AgentMemoryConfig.compressionModelnow reportscompressModel ?? modelinstead of always mirroringprovider.model..env.exampleentry + README "Cost-aware model selection" and Environment Variables sections.No MCP tools, REST endpoints, or version fields touched, so the AGENTS.md consistency checklists don't apply.
Field test
Ran on a real instance (OpenAI-compatible provider) for ~20 hours with the main model on
Qwen/Qwen3-30B-A3B-Instruct-2507andAGENTMEMORY_COMPRESS_MODEL=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B: ~500 observation compressions, average qualityScore 93.7 (vs 96.6 on the main model), retry rate 1.2%, zero LLM errors or circuit-breaker trips; session summaries kept flowing through the main model. Compression cost dropped ~40% on input / 20% on output.How to verify
npm run build— clean.npm test— 1421 tests pass (1415 baseline + 6 new intest/compress-model.test.ts: per-provider assertions thatcompress()sends the override andsummarize()sends the main model, plusloadConfig()pickup/default tests).AGENTMEMORY_COMPRESS_MODELto a cheaper model of your provider, restart, and watchmodelin outbound compression requests (or provider dashboard) while session summaries keep using the main model.Summary by CodeRabbit
New Features
Documentation
Tests