Skip to content

feat: add cache compounding savings to rtk gain#261

Open
sahilmgandhi wants to merge 6 commits intortk-ai:masterfrom
sahilmgandhi:feat/cache-compounding-savings
Open

feat: add cache compounding savings to rtk gain#261
sahilmgandhi wants to merge 6 commits intortk-ai:masterfrom
sahilmgandhi:feat/cache-compounding-savings

Conversation

@sahilmgandhi
Copy link
Contributor

@sahilmgandhi sahilmgandhi commented Feb 23, 2026

I think more importantly than just the immediate savings of rtk, it also offers compounding savings in the form of less cache thrashing. This is helpful both from a monetary perspective (cache writes/reads cost money) and also from a time perspective (fewer tokens from cache => less thinking => less time for output** (potentially))

Summary

  • Adds a "Max Theoretical Savings from Caching" section to rtk gain showing how direct token savings compound through Claude Code's prompt caching
  • Scans JSONL session files to compute average session length, then applies the multiplier: 1.25 + 0.1 × avg_remaining_turns
  • Results cached in SQLite for 24h (with spinner on first computation) so rtk gain stays instant
  • Includes optional dollar amounts when ccusage is available (detects ccusage, npx, pnpx, pnpm dlx)
  • Adds cache_compounding field to JSON/CSV export
  • Labels output as "Theoretical max" with assumption note to set correct expectations

How it works

When RTK removes tokens from command output, those tokens:

  1. Never get written to cache (avoids 1.25x input cost)
  2. Never get re-read from cache on every subsequent turn (avoids 0.1x per turn)

For a session with 336 avg turns (median position = 168 remaining), the multiplier is 1.25 + 0.1 × 168 = 18.06x.

Example output

Cache Compounding Effect
target/release/rtk gain
RTK Token Savings (Global Scope)
════════════════════════════════════════════════════════════

Total commands:    684
Input tokens:      2.0M
Output tokens:     255.4K
Tokens saved:      1.7M (87.1%)
Total exec time:   16m2s (avg 1.4s)
Efficiency meter: █████████████████████░░░ 87.1%

By Command
────────────────────────────────────────────────────────────────────────
  #  Command                   Count   Saved    Avg%    Time  Impact
────────────────────────────────────────────────────────────────────────
 1.  rtk go test ./server/...      2  866.3K  100.0%   41.9s  ██████████
 2.  rtk go test ./server/...      1  393.4K  100.0%   27.3s  █████░░░░░
 3.  rtk cargo clippy --al...     26  144.2K   92.9%   693ms  ██░░░░░░░░
 4.  rtk cargo test                8   62.2K   99.6%    2.3s  █░░░░░░░░░
 5.  rtk cargo test --all          5   38.2K   99.8%    1.6s  ░░░░░░░░░░
 6.  rtk find                    167   30.9K   79.1%     1ms  ░░░░░░░░░░
 7.  rtk git diff b0378a62...      4   26.7K   76.6%    53ms  ░░░░░░░░░░
 8.  rtk go test -run Test...      2   13.7K   99.9%   15.1s  ░░░░░░░░░░
 9.  rtk cargo test -- --n...      1    8.6K   99.9%    1.1s  ░░░░░░░░░░
10.  rtk ls                       34    8.3K   61.5%     7ms  ░░░░░░░░░░
────────────────────────────────────────────────────────────────────────

 Max Theoretical Savings from Caching
  ──────────────────────────────────────────────────────────────
  Direct savings:    14.4M
  Avg session turns: 254 (from 106 sessions)
  Avg remaining:     127
  Cache multiplier:  13.94x  (1.25 + 0.1 x 127)
    ┌─────────────────────────────────────────────────────────┐
    │ Theoretical max:     201.0M tokens  ($940.13)           │
    └─────────────────────────────────────────────────────────┘
  Assumes every saved token avoids 1.25x cache write + 0.1x
  cache read per subsequent turn (prompt cache pricing model).

Graceful degradation

  • No Claude Code sessions found → uses fallback estimate of 20 turns with "(model estimate)" label
  • No ccusage installed → omits dollar amount, shows install tip
  • Any failure → section silently omitted

Files changed

File Change
src/session_stats.rs NEW — session stats + compounding logic (8 unit tests)
src/cc_economics.rs Export WEIGHT_* constants as pub(crate), remove dead BILLION
src/main.rs Register mod session_stats
src/gain.rs Add display + JSON/CSV export, consolidate color helpers

Test plan

  • cargo fmt --all --check — clean
  • cargo clippy --all-targets — no new warnings
  • cargo test --all — 717 passed, 2 ignored
  • Manual: rtk gain shows new section with real session data
  • Manual: rtk gain --format json includes cache_compounding field
  • Manual: second rtk gain invocation hits 24h cache (instant, no spinner)
  • Manual: no stall when ccusage not installed (stdin null fix)

@sahilmgandhi sahilmgandhi force-pushed the feat/cache-compounding-savings branch from f0bf678 to 4aef192 Compare February 23, 2026 18:36
Show how direct token savings compound through Claude Code's prompt
caching. Saved tokens avoid a 1.25x cache write plus 0.1x cache read
on every subsequent turn, producing a multiplier based on average
session length (scanned from JSONL session files).

New section appears after the "By Command" table with multiplier,
effective savings, and optional dollar amounts (when ccusage installed).
Gracefully degrades: falls back to 20-turn estimate when no session
data, omits section entirely on failure.

- New module: src/session_stats.rs (8 unit tests)
- Export WEIGHT_* constants from cc_economics.rs
- Add cache_compounding field to JSON/CSV export
- Remove dead BILLION constant, consolidate color helpers
@sahilmgandhi sahilmgandhi force-pushed the feat/cache-compounding-savings branch from 4aef192 to fc2f85e Compare February 23, 2026 18:39
@pszymkowiak
Copy link
Collaborator

Hi, this PR has conflicts with master. Could you rebase on current master? Thanks!

Resolve conflicts in src/gain.rs: keep both cache compounding
functions (branch) and project scope functions (master).
@sahilmgandhi
Copy link
Contributor Author

Rebased and verified all tests passed.

@FlorianBruniaux
Copy link
Collaborator

Hi @sahilmgandhi, the graceful degradation chain is excellent throughout: missing sessions fall back to defaults, missing ccusage skips the dollar amount, any error silently skips the section. That's the right design. Two things to discuss before we move forward:

  1. The formula direct_saved × (1.25 + 0.1 × avg_remaining_turns) represents the theoretical maximum assuming every saved token is re-cached and re-read on every subsequent turn for the rest of the session. For a 300-turn session that produces a ~31x multiplier, which is plausible in theory but far from what most users will observe. We'd like to label the output something like "Max theoretical savings (prompt cache model)" rather than "Effective savings", with a brief note explaining the assumption. The underlying insight is real and worth surfacing, but the current framing will surprise users who compare the number to their actual billing.

  2. rtk gain now scans every JSONL session file from the last 90 days on every invocation. For users with hundreds of sessions this could add a second or two to what should be an instant command. A hard limit (scan at most the 50 most recent files) or a 24h cache of the computed stats in the SQLite tracking database would keep it fast.

Happy to discuss the right framing for the multiplier.

@sahilmgandhi
Copy link
Contributor Author

Hi @sahilmgandhi, the graceful degradation chain is excellent throughout: missing sessions fall back to defaults, missing ccusage skips the dollar amount, any error silently skips the section. That's the right design. Two things to discuss before we move forward:

  1. The formula direct_saved × (1.25 + 0.1 × avg_remaining_turns) represents the theoretical maximum assuming every saved token is re-cached and re-read on every subsequent turn for the rest of the session. For a 300-turn session that produces a ~31x multiplier, which is plausible in theory but far from what most users will observe. We'd like to label the output something like "Max theoretical savings (prompt cache model)" rather than "Effective savings", with a brief note explaining the assumption. The underlying insight is real and worth surfacing, but the current framing will surprise users who compare the number to their actual billing.

This makes sense, it isn't the true cache savings, but the theoretical max. Let me change it to that.

  1. rtk gain now scans every JSONL session file from the last 90 days on every invocation. For users with hundreds of sessions this could add a second or two to what should be an instant command. A hard limit (scan at most the 50 most recent files) or a 24h cache of the computed stats in the SQLite tracking database would keep it fast.

Happy to discuss the right framing for the multiplier.

Let me think about this one a bit, limiting to 50 would mean some savings won't show up.
A 24h cache in the SQLite would keep it fast but then the first invocation of the command every day would still be slow.

Maybe what I can do is something like:

  1. Add a little spinner letting the user know that rtk is computing avg session duration. This will happen once a day.
  2. Have a 24H? cache. Hit the cache on every subsequent turn. On a 24 hour basis, the average session duration probably won't change much (except if you just started using these AI tools).

@sahilmgandhi sahilmgandhi force-pushed the feat/cache-compounding-savings branch from 8911446 to a8fe88a Compare March 6, 2026 23:26
Update test_try_parse_git_with_dash_c to expect success since
master now supports git global options (-C, -c, etc.).
@sahilmgandhi sahilmgandhi force-pushed the feat/cache-compounding-savings branch from a8fe88a to 71b7a20 Compare March 6, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants