Skip to content

Compact decision-history format: 12-21% fewer tokens#2

Open
xMKx wants to merge 1 commit into
mainfrom
optim/compact-decision-history
Open

Compact decision-history format: 12-21% fewer tokens#2
xMKx wants to merge 1 commit into
mainfrom
optim/compact-decision-history

Conversation

@xMKx

@xMKx xMKx commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Switches ContextBuilder.buildDecisionContext from a numbered prose list to a single-line Prior decisions: Q=A; Q=A. form. Saves 12-21% tokens per fork on the decision-history slice of the user prompt.

Token measurements

Tokenizer: gpt-tokenizer cl100k_base (OpenAI), used as an offline Claude proxy. Relative deltas should hold against the Anthropic tokenizer; absolute counts will vary slightly.

Path depth Prose (current) Compact (this PR) Delta
1 17 tok 15 tok -12%
3 34 tok 27 tok -21%
5 52 tok 41 tok -21%

These savings hit every non-fork child node (i.e. every fork on Anthropic/OpenAI providers, since neither natively resumes sessions). In a width=4 depth=3 tree that's 84 nodes each paying the delta.

Format choice

I tested four candidate formats before picking this one:

  • JSON block ({"task":"...","resolved_decisions":[...]}): cost +4% to +35% tokens vs prose because of structural overhead ({, ", key names, brackets). The "structured data is more efficient" intuition was wrong here.
  • Raw KV ([Q=A; Q=A] task): saved ~30% but dropped the "Prior decisions:" semantic anchor, which I judged risky — the LLM might read those as constraints rather than resolved state.
  • Newline KV (Prior decisions:\n- Q: A): only ~9% savings, didn't justify the format churn.
  • Inline KV with anchor (this PR): saves 12-21% AND keeps the "Prior decisions:" framing. Best balance.

Behavioural risk

Compact KV may be harder for the LLM to use than the numbered prose, especially if an answer text contains = or ;. None of the existing tests exercise such answers; real-world answers like "JWT-based authentication" or "PostgreSQL with JSONB" should also be safe. If anyone hits a regression, the fix is to escape =/; in answer text — happy to follow up.

Test plan

  • All 405 unit tests pass (npm test)
  • TypeScript build clean (npm run build)
  • Updated context-builder tests to match new format + added a regression check that the compact form is always shorter than the prose form
  • Run a real explore against the Anthropic API provider to confirm projected savings translate to actual cost reduction (in progress, separate validation pass)

Followups (separate PRs)

The other big token win is enabling prompt caching in the Anthropic API provider — the system prompt is currently sent without cache_control on every fork, paying full input price 85 times per tree. That PR is more involved because the current ~255-token system prompt is below the 1024-token caching minimum, so it needs to be restructured alongside the caching flag. Separate PR incoming.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Switch buildDecisionContext from a numbered prose list to a single-line
"Prior decisions: Q=A; Q=A." form.

Token deltas (cl100k_base proxy):
  depth=1  17 -> 15 tok  (-12%)
  depth=3  34 -> 27 tok  (-21%)
  depth=5  52 -> 41 tok  (-21%)

The "Prior decisions:" anchor is preserved so the LLM still recognises
this as resolved state rather than negotiable constraints. The raw
[Q=A; Q=A] form saved ~30% but lost that framing; this is the safer
trade between size and behavioural risk.

Note: an earlier experiment converting the history to a JSON block went
the wrong way (+4-35% tokens) because of structural overhead from keys
and quoting. Compact KV is the form that actually wins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant