Compact decision-history format: 12-21% fewer tokens by xMKx · Pull Request #2 · PQCWorld/llmception

xMKx · 2026-05-26T08:55:03Z

Summary

Switches ContextBuilder.buildDecisionContext from a numbered prose list to a single-line Prior decisions: Q=A; Q=A. form. Saves 12-21% tokens per fork on the decision-history slice of the user prompt.

Token measurements

Tokenizer: gpt-tokenizer cl100k_base (OpenAI), used as an offline Claude proxy. Relative deltas should hold against the Anthropic tokenizer; absolute counts will vary slightly.

Path depth	Prose (current)	Compact (this PR)	Delta
1	17 tok	15 tok	-12%
3	34 tok	27 tok	-21%
5	52 tok	41 tok	-21%

These savings hit every non-fork child node (i.e. every fork on Anthropic/OpenAI providers, since neither natively resumes sessions). In a width=4 depth=3 tree that's 84 nodes each paying the delta.

Format choice

I tested four candidate formats before picking this one:

JSON block ({"task":"...","resolved_decisions":[...]}): cost +4% to +35% tokens vs prose because of structural overhead ({, ", key names, brackets). The "structured data is more efficient" intuition was wrong here.
Raw KV ([Q=A; Q=A] task): saved ~30% but dropped the "Prior decisions:" semantic anchor, which I judged risky — the LLM might read those as constraints rather than resolved state.
Newline KV (Prior decisions:\n- Q: A): only ~9% savings, didn't justify the format churn.
Inline KV with anchor (this PR): saves 12-21% AND keeps the "Prior decisions:" framing. Best balance.

Behavioural risk

Compact KV may be harder for the LLM to use than the numbered prose, especially if an answer text contains = or ;. None of the existing tests exercise such answers; real-world answers like "JWT-based authentication" or "PostgreSQL with JSONB" should also be safe. If anyone hits a regression, the fix is to escape =/; in answer text — happy to follow up.

Test plan

All 405 unit tests pass (npm test)
TypeScript build clean (npm run build)
Updated context-builder tests to match new format + added a regression check that the compact form is always shorter than the prose form
Run a real explore against the Anthropic API provider to confirm projected savings translate to actual cost reduction (in progress, separate validation pass)

Followups (separate PRs)

The other big token win is enabling prompt caching in the Anthropic API provider — the system prompt is currently sent without cache_control on every fork, paying full input price 85 times per tree. That PR is more involved because the current ~255-token system prompt is below the 1024-token caching minimum, so it needs to be restructured alongside the caching flag. Separate PR incoming.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Switch buildDecisionContext from a numbered prose list to a single-line "Prior decisions: Q=A; Q=A." form. Token deltas (cl100k_base proxy): depth=1 17 -> 15 tok (-12%) depth=3 34 -> 27 tok (-21%) depth=5 52 -> 41 tok (-21%) The "Prior decisions:" anchor is preserved so the LLM still recognises this as resolved state rather than negotiable constraints. The raw [Q=A; Q=A] form saved ~30% but lost that framing; this is the safer trade between size and behavioural risk. Note: an earlier experiment converting the history to a JSON block went the wrong way (+4-35% tokens) because of structural overhead from keys and quoting. Compact KV is the form that actually wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact decision-history format: 12-21% fewer tokens#2

Compact decision-history format: 12-21% fewer tokens#2
xMKx wants to merge 1 commit into
mainfrom
optim/compact-decision-history

xMKx commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xMKx commented May 26, 2026

Summary

Token measurements

Format choice

Behavioural risk

Test plan

Followups (separate PRs)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant