Skip to content

Prototype shorter cost comparison periods#1816

Draft
steipete wants to merge 1 commit into
mainfrom
codex/cost-window-comparisons-decision
Draft

Prototype shorter cost comparison periods#1816
steipete wants to merge 1 commit into
mainfrom
codex/cost-window-comparisons-decision

Conversation

@steipete

@steipete steipete commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

Product decision

Recommended: merge with the setting defaulting off. This gives #1500 a bounded implementation without increasing existing menu density or scan cost.

Alternatives:

  1. default the setting on;
  2. retain only the tested aggregation model and defer presentation.

This draft intentionally does not claim a lifetime total. Local logs are incomplete and mutable, so #1708 needs an approved retention, ownership, deletion, and completeness contract before implementation.

Proof

  • focused cost/settings/model suite: 83 tests passed
  • localization catalog suite: 18 tests passed
  • make check
  • make test: all 44 shards passed
  • ./Scripts/package_app.sh
  • packaged General settings UI verified locally with 90-day history and the comparison toggle enabled
  • autoreview: clean, no accepted/actionable findings

Refs #1500
Refs #1708

@clawsweeper

clawsweeper Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed July 1, 2026, 6:07 AM ET / 10:07 UTC.

Summary
The PR adds a default-off setting that renders 7/30/90-day cost comparison rows from the existing daily snapshot, plus persistence, localization, docs, and focused tests.

Reproducibility: not applicable. this is a feature PR rather than a bug report. Source review of current main shows only the single cost-history window, while the PR adds a default-off comparison toggle and tested aggregation.

Review metrics: 2 noteworthy metrics.

  • Changed surface: 27 files, 370 additions, 7 deletions. The patch spans model, menu UI, settings persistence, localization, docs, and tests, so review needs to consider more than the core aggregation helper.
  • Focused coverage: 2 test files added, 2 existing test files extended. The new tests directly cover aggregation, opt-in rendering, preference smoke behavior, and settings persistence for the feature.

Root-cause cluster
Relationship: fixed_by_candidate
Canonical: #1500
Summary: This PR is a candidate implementation for the multi-period cost-window feature request; the lifetime-ledger request is related cost-history product work but remains a distinct data-contract problem.

Members:

Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Refresh the branch against current main before merge so the recent cost fixes are included in the final review context.

Risk before merge

  • [P1] The PR is still a draft product prototype, so maintainers still need to decide whether the default-off comparison setting is the accepted product shape before merge.
  • [P1] The PR base is behind current main; I did not see a concrete merge-result regression, but the branch should be refreshed/reviewed against the latest cost fixes before landing.
  • [P1] This read-only review did not run tests or package the app; it relies on source inspection and the PR body's reported validation.

Maintainer options:

  1. Decide the mitigation before merge
    Land the default-off comparison-period feature only after the owner/maintainer decides the draft product shape is acceptable and the branch is refreshed against current main.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] Human review should decide whether to graduate the draft default-off product shape; no narrow automated repair is needed from this review.

Security
Cleared: No concrete security or supply-chain concern was found; the diff stays within app model/UI/settings, localization, docs, and tests without new dependencies, scripts, auth, or secret handling.

Review details

Best possible solution:

Land the default-off comparison-period feature only after the owner/maintainer decides the draft product shape is acceptable and the branch is refreshed against current main.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a feature PR rather than a bug report. Source review of current main shows only the single cost-history window, while the PR adds a default-off comparison toggle and tested aggregation.

Is this the best way to solve the issue?

Yes as a bounded prototype, pending maintainer product approval. Deriving rows from the existing daily snapshot avoids extra scans/storage, and keeping the setting default-off preserves current behavior for existing users.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 306cf96787f0.

Label changes

Label changes:

  • add P3: This is a low-risk, default-off cost-history UX feature/prototype rather than an urgent broken workflow or security issue.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this owner-authored PR; the body still reports local packaged UI verification and full check/test runs.

Label justifications:

  • P3: This is a low-risk, default-off cost-history UX feature/prototype rather than an urgent broken workflow or security issue.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this owner-authored PR; the body still reports local packaged UI verification and full check/test runs.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read fully; its guidance on focused settings/cost tests, avoiding live keychain/provider validation, and conservative maintainer handling applied to this review. (AGENTS.md:1, 306cf96787f0)
  • Current main lacks the new feature: Searching current main found the existing token usage section but no costComparisonPeriodsEnabled, comparisonSummaries, or CostUsageWindowSummary implementation. (306cf96787f0)
  • PR aggregates shorter windows from existing daily rows: The PR adds CostUsageWindowSummary plus summary(forLastDays:) and comparisonSummaries(periods:) that filter existing daily entries by calendar day and only include periods shorter than the selected history window. (Sources/CodexBarCore/CostUsageModels.swift:70, 5d6ca31a913b)
  • PR gates visible rows behind the new setting: The token usage section only maps comparison summaries into menu lines when comparisonPeriodsEnabled is true, preserving the default menu output when the setting is off. (Sources/CodexBar/MenuCardView+Costs.swift:135, 5d6ca31a913b)
  • PR stores the setting default-off: SettingsStore loads costComparisonPeriodsEnabled from UserDefaults with a false default and persists user changes under its own key. (Sources/CodexBar/SettingsStore.swift:414, 5d6ca31a913b)
  • PR exposes the setting in General preferences: The General pane adds a PreferenceToggleRow bound to settings.costComparisonPeriodsEnabled directly below the cost history days editor. (Sources/CodexBar/PreferencesGeneralPane.swift:173, 5d6ca31a913b)

Likely related people:

  • steipete: Peter Steinberger owns the PR and current history also points to his prior cost usage/model and release-baseline work in the relevant files. (role: feature owner and recent area contributor; confidence: high; commits: f380287041b8, 5d6ca31a913b; files: Sources/CodexBarCore/CostUsageModels.swift, Sources/CodexBar/MenuCardView+Costs.swift, Sources/CodexBar/PreferencesGeneralPane.swift)
  • Zihao Qi: Recent commits changed the cost summary display and CostUsageTokenSnapshot/current-day behavior adjacent to this PR's model and menu surface. (role: recent adjacent contributor; confidence: medium; commits: 8eb852493103, 929d55aaf1d7; files: Sources/CodexBarCore/CostUsageModels.swift, Sources/CodexBar/MenuCardView+ModelInput.swift, Sources/CodexBar/PreferencesGeneralPane.swift)
  • kiranmagic7: Authored the merged change that made the existing single cost-history days setting editable, which is the setting surface this PR extends. (role: recent adjacent contributor; confidence: medium; commits: a93744328de1; files: Sources/CodexBar/PreferencesGeneralPane.swift, Tests/CodexBarTests/PreferencesPaneSmokeTests.swift)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P3 Low-risk cleanup, docs, polish, ergonomics, or speculative feature. labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low-risk cleanup, docs, polish, ergonomics, or speculative feature. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant