feat(cache): project mode prompts per request#2801
Conversation
Keep the stable system prompt mode-agnostic and project mode, approval policy, and tool taxonomy as request-time runtime metadata. This avoids mutating stored history while preserving provider chat-template compatibility. Harvested from PR #2687 with stewardship turn-metadata cache tests preserved. The replan replay guard remains <= 2, and cache inspect now asserts tool-result budget metadata for both deduplicated=false and deduplicated=true. (cherry picked from commit 7794330)
There was a problem hiding this comment.
Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request refactors prompt handling by projecting mode, approval, and tool-taxonomy metadata as transient, request-time runtime prompts instead of mutating the stored system prompt, thereby keeping provider prefix caches byte-stable. Feedback on these changes highlights a potential issue where appending the runtime prompt as a separate user message can create consecutive user messages, which violates the alternating role requirements of strict API providers like Anthropic. To resolve this, it is recommended to merge the runtime prompt content into the last message if it is also a user message.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| let mut messages = self.session.messages.clone(); | ||
| messages.push(self.runtime_prompt_message()); | ||
| messages |
There was a problem hiding this comment.
Appending the runtime prompt as a separate user message can result in consecutive user messages in the message list (e.g., when the last message in the session history is also a user message).
Strict API providers (such as Anthropic Claude) and strict Jinja chat templates (used by many open-source models via Ollama, vLLM, Hugging Face, etc.) strictly enforce alternating roles (user/assistant). Sending consecutive user messages will cause validation errors (e.g., 400 Bad Request or TemplateError) and crash the application.
To prevent this, we should check if the last message in the history is a user message (and not a tool result), and if so, merge the runtime prompt's content into that last message's content array. This preserves the prefix cache of all previous turns (since the current turn's message is at the end and is new anyway) while ensuring perfect compatibility with strict providers and templates.
let mut messages = self.session.messages.clone();
let runtime_msg = self.runtime_prompt_message();
let merged = if let Some(last_msg) = messages.last_mut() {
let is_tool_result = last_msg.content.iter().any(|block| {
matches!(block, ContentBlock::ToolResult { .. })
});
if last_msg.role == "user" && !is_tool_result {
last_msg.content.extend(runtime_msg.content);
true
} else {
false
}
} else {
false
};
if !merged {
messages.push(runtime_msg);
}
messages…ions to system prompt - Add render_runtime_policy_reference() in prompts.rs containing all mode and approval policy descriptions in the frozen system-prompt prefix (sent once per session, cache-hit thereafter). - Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>). - Fix markdown heading hierarchy in all prompts/modes/*.md and prompts/approvals/*.md (## → #####) to nest correctly under ####. - Remove now-unused legacy functions: mode_prompt(), approval_prompt_for_mode(), mode_change_runtime_message(). - Simplify Op::ChangeMode: no longer persists a mode_change event (next turn tag carries the current mode). - Update and rename affected tests. Builds on Hmbown#2801. Reduces per-request runtime prompt overhead by 97% (~471 tokens saved per API call). System prompt grows by ~1325 tokens in the frozen prefix (one-time miss cost); break-even at 3 API calls.
Harvests PR #2687 by @LeoAlex0 for the v0.9.0 stewardship branch.
This keeps the stable system prompt mode-agnostic and projects mode, approval policy, and tool taxonomy as request-time user-role runtime metadata. That preserves stored history/prefix-cache bytes and avoids appending extra system messages for strict chat-template providers.
Stewardship polish from the source PR:
<= 2behavior.deduplicated=falseanddeduplicated=truetool-result metadata.Issue #2722 remains open as the broader v0.9 harvest tracker.
Verification:
cargo fmt --all -- --checkgit diff --check origin/codex/v0.9.0-stewardship..HEADpython3 scripts/check-coauthor-trailers.py --range origin/codex/v0.9.0-stewardship..HEAD --check-authorscargo test -p codewhale-tui --locked --bin codewhale-tui runtime_prompt -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui turn_metadata -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui cache_inspect -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui prompt -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui capacity -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui post_tool_replay_invoked_when_high_non_severe_risk -- --nocapturecargo test -p codewhale-tui --locked --bin codewhale-tui error_escalation_triggers_replan_when_severe_or_repeated_failures -- --nocapturecargo clippy -p codewhale-tui --locked --all-targets --all-features -- -D warnings