Skip to content

feat(cache): project mode prompts per request#2801

Merged
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/harvest-2687-runtime-prompt-metadata
Jun 5, 2026
Merged

feat(cache): project mode prompts per request#2801
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/harvest-2687-runtime-prompt-metadata

Conversation

@Hmbown

@Hmbown Hmbown commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Harvests PR #2687 by @LeoAlex0 for the v0.9.0 stewardship branch.

This keeps the stable system prompt mode-agnostic and projects mode, approval policy, and tool taxonomy as request-time user-role runtime metadata. That preserves stored history/prefix-cache bytes and avoids appending extra system messages for strict chat-template providers.

Stewardship polish from the source PR:

  • Preserves the existing turn-metadata cache tests from the stewardship branch.
  • Keeps the replan replay guard at the real <= 2 behavior.
  • Tightens cache-inspect coverage to require both deduplicated=false and deduplicated=true tool-result metadata.
  • Adds changelog credit for @LeoAlex0.

Issue #2722 remains open as the broader v0.9 harvest tracker.

Verification:

  • cargo fmt --all -- --check
  • git diff --check origin/codex/v0.9.0-stewardship..HEAD
  • python3 scripts/check-coauthor-trailers.py --range origin/codex/v0.9.0-stewardship..HEAD --check-authors
  • cargo test -p codewhale-tui --locked --bin codewhale-tui runtime_prompt -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui turn_metadata -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui cache_inspect -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui prompt -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui capacity -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui post_tool_replay_invoked_when_high_non_severe_risk -- --nocapture
  • cargo test -p codewhale-tui --locked --bin codewhale-tui error_escalation_triggers_replan_when_severe_or_repeated_failures -- --nocapture
  • cargo clippy -p codewhale-tui --locked --all-targets --all-features -- -D warnings

Keep the stable system prompt mode-agnostic and project mode, approval policy, and tool taxonomy as request-time runtime metadata. This avoids mutating stored history while preserving provider chat-template compatibility.

Harvested from PR #2687 with stewardship turn-metadata cache tests preserved. The replan replay guard remains <= 2, and cache inspect now asserts tool-result budget metadata for both deduplicated=false and deduplicated=true.

(cherry picked from commit 7794330)

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Hmbown Hmbown merged commit fbe8d9e into codex/v0.9.0-stewardship Jun 5, 2026
2 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors prompt handling by projecting mode, approval, and tool-taxonomy metadata as transient, request-time runtime prompts instead of mutating the stored system prompt, thereby keeping provider prefix caches byte-stable. Feedback on these changes highlights a potential issue where appending the runtime prompt as a separate user message can create consecutive user messages, which violates the alternating role requirements of strict API providers like Anthropic. To resolve this, it is recommended to merge the runtime prompt content into the last message if it is also a user message.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2261 to +2263
let mut messages = self.session.messages.clone();
messages.push(self.runtime_prompt_message());
messages

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Appending the runtime prompt as a separate user message can result in consecutive user messages in the message list (e.g., when the last message in the session history is also a user message).

Strict API providers (such as Anthropic Claude) and strict Jinja chat templates (used by many open-source models via Ollama, vLLM, Hugging Face, etc.) strictly enforce alternating roles (user/assistant). Sending consecutive user messages will cause validation errors (e.g., 400 Bad Request or TemplateError) and crash the application.

To prevent this, we should check if the last message in the history is a user message (and not a tool result), and if so, merge the runtime prompt's content into that last message's content array. This preserves the prefix cache of all previous turns (since the current turn's message is at the end and is new anyway) while ensuring perfect compatibility with strict providers and templates.

        let mut messages = self.session.messages.clone();
        let runtime_msg = self.runtime_prompt_message();
        let merged = if let Some(last_msg) = messages.last_mut() {
            let is_tool_result = last_msg.content.iter().any(|block| {
                matches!(block, ContentBlock::ToolResult { .. })
            });
            if last_msg.role == "user" && !is_tool_result {
                last_msg.content.extend(runtime_msg.content);
                true
            } else {
                false
            }
        } else {
            false
        };
        if !merged {
            messages.push(runtime_msg);
        }
        messages

LeoAlex0 added a commit to LeoAlex0/CodeWhale that referenced this pull request Jun 7, 2026
…ions to system prompt

- Add render_runtime_policy_reference() in prompts.rs containing all
  mode and approval policy descriptions in the frozen system-prompt
  prefix (sent once per session, cache-hit thereafter).
- Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token
  self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>).
- Fix markdown heading hierarchy in all prompts/modes/*.md and
  prompts/approvals/*.md (## → #####) to nest correctly under ####.
- Remove now-unused legacy functions: mode_prompt(),
  approval_prompt_for_mode(), mode_change_runtime_message().
- Simplify Op::ChangeMode: no longer persists a mode_change event
  (next turn tag carries the current mode).
- Update and rename affected tests.

Builds on Hmbown#2801. Reduces per-request runtime prompt overhead by 97%
(~471 tokens saved per API call). System prompt grows by ~1325 tokens
in the frozen prefix (one-time miss cost); break-even at 3 API calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants