feat(cache): project mode prompts per request by Hmbown · Pull Request #2801 · Hmbown/CodeWhale

Hmbown · 2026-06-05T16:24:12Z

Harvests PR #2687 by @LeoAlex0 for the v0.9.0 stewardship branch.

This keeps the stable system prompt mode-agnostic and projects mode, approval policy, and tool taxonomy as request-time user-role runtime metadata. That preserves stored history/prefix-cache bytes and avoids appending extra system messages for strict chat-template providers.

Stewardship polish from the source PR:

Preserves the existing turn-metadata cache tests from the stewardship branch.
Keeps the replan replay guard at the real <= 2 behavior.
Tightens cache-inspect coverage to require both deduplicated=false and deduplicated=true tool-result metadata.
Adds changelog credit for @LeoAlex0.

Issue #2722 remains open as the broader v0.9 harvest tracker.

Verification:

cargo fmt --all -- --check
git diff --check origin/codex/v0.9.0-stewardship..HEAD
python3 scripts/check-coauthor-trailers.py --range origin/codex/v0.9.0-stewardship..HEAD --check-authors
cargo test -p codewhale-tui --locked --bin codewhale-tui runtime_prompt -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui turn_metadata -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui cache_inspect -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui prompt -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui capacity -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui post_tool_replay_invoked_when_high_non_severe_risk -- --nocapture
cargo test -p codewhale-tui --locked --bin codewhale-tui error_escalation_triggers_replan_when_severe_or_repeated_failures -- --nocapture
cargo clippy -p codewhale-tui --locked --all-targets --all-features -- -D warnings

Keep the stable system prompt mode-agnostic and project mode, approval policy, and tool taxonomy as request-time runtime metadata. This avoids mutating stored history while preserving provider chat-template compatibility. Harvested from PR #2687 with stewardship turn-metadata cache tests preserved. The replan replay guard remains <= 2, and cache inspect now asserts tool-result budget metadata for both deduplicated=false and deduplicated=true. (cherry picked from commit 7794330)

greptile-apps

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

gemini-code-assist

Code Review

This pull request refactors prompt handling by projecting mode, approval, and tool-taxonomy metadata as transient, request-time runtime prompts instead of mutating the stored system prompt, thereby keeping provider prefix caches byte-stable. Feedback on these changes highlights a potential issue where appending the runtime prompt as a separate user message can create consecutive user messages, which violates the alternating role requirements of strict API providers like Anthropic. To resolve this, it is recommended to merge the runtime prompt content into the last message if it is also a user message.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-05T16:26:26Z

+        let mut messages = self.session.messages.clone();
+        messages.push(self.runtime_prompt_message());
+        messages


Appending the runtime prompt as a separate user message can result in consecutive user messages in the message list (e.g., when the last message in the session history is also a user message).

Strict API providers (such as Anthropic Claude) and strict Jinja chat templates (used by many open-source models via Ollama, vLLM, Hugging Face, etc.) strictly enforce alternating roles (user/assistant). Sending consecutive user messages will cause validation errors (e.g., 400 Bad Request or TemplateError) and crash the application.

To prevent this, we should check if the last message in the history is a user message (and not a tool result), and if so, merge the runtime prompt's content into that last message's content array. This preserves the prefix cache of all previous turns (since the current turn's message is at the end and is new anyway) while ensuring perfect compatibility with strict providers and templates.

let mut messages = self.session.messages.clone(); let runtime_msg = self.runtime_prompt_message(); let merged = if let Some(last_msg) = messages.last_mut() { let is_tool_result = last_msg.content.iter().any(|block| { matches!(block, ContentBlock::ToolResult { .. }) }); if last_msg.role == "user" && !is_tool_result { last_msg.content.extend(runtime_msg.content); true } else { false } } else { false }; if !merged { messages.push(runtime_msg); } messages

…ions to system prompt - Add render_runtime_policy_reference() in prompts.rs containing all mode and approval policy descriptions in the frozen system-prompt prefix (sent once per session, cache-hit thereafter). - Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>). - Fix markdown heading hierarchy in all prompts/modes/*.md and prompts/approvals/*.md (## → #####) to nest correctly under ####. - Remove now-unused legacy functions: mode_prompt(), approval_prompt_for_mode(), mode_change_runtime_message(). - Simplify Op::ChangeMode: no longer persists a mode_change event (next turn tag carries the current mode). - Update and rename affected tests. Builds on Hmbown#2801. Reduces per-request runtime prompt overhead by 97% (~471 tokens saved per API call). System prompt grows by ~1325 tokens in the frozen prefix (one-time miss cost); break-even at 3 API calls.

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Hmbown merged commit fbe8d9e into codex/v0.9.0-stewardship Jun 5, 2026
2 checks passed

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

This was referenced Jun 5, 2026

feat(engine): project mode prompts per request #2687

Closed

feat(prompts): add static prompt composer override for embedders #2786

Closed

feat(client): add cross-session prompt base section disk cache #2520

Closed

LeoAlex0 mentioned this pull request Jun 7, 2026

feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt #2874

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): project mode prompts per request#2801

feat(cache): project mode prompts per request#2801
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/harvest-2687-runtime-prompt-metadata

Hmbown commented Jun 5, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hmbown commented Jun 5, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants