v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335
Merged
v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335
Conversation
Replaces contradictory "replace it entirely" + "always last section / move if mid-file" bullets in scripts/resolvers/review.ts with a single delete-then-append rule. Adds Read-tool verification step so the agent self-checks before continuing. Affected SKILL.md files (regenerated): plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review, codex, devex-review.
…plan E2E shape 5 new static tests in test/gen-skill-docs.test.ts (4 plan-review SKILL.md files + 1 source resolver) verify the new prompt language is present and the old contradictory bullets are absent. Synthetic regression check confirmed all 5 fail when the prompt fix is reverted. The autoplan E2E (skill-e2e-autoplan-auto-mode.test.ts) reverts to its original AUQ-blocked-gate-surface shape. The mid-file regression scenario the plan briefly proposed isn't reachable in the current PTY harness because --disallowedTools AskUserQuestion makes autoplan bail at the Phase 1 premise gate before any review-write code path runs. Static prompt-text verification covers the load-bearing change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E Evals: ✅ PASS14/14 tests passed | $2.68 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the bug where
## GSTACK REVIEW REPORTwould land mid-file in plan-mode plans whenever a stale copy from a prior/autoplanrun was already there. The contradictory write rule inscripts/resolvers/review.ts("replace it entirely (in place)" vs "always last section / move if mid-file") is collapsed into a single delete-then-append flow with explicit Read-tool verification.Performance / behavior
Coverage
test/gen-skill-docs.test.tslock the prompt change against drift across all 4 plan-review SKILL.md files + the source resolver. Synthetic regression check (revert → 5 fails, restore → 5 passes) confirms the tests are bound to the prompt change.Infrastructure
plan-ceo-review,plan-design-review,plan-devex-review,plan-eng-review,codex,devex-review).test/skill-e2e-autoplan-auto-mode.test.tsreverts to its original AUQ-blocked-gate-surface shape after a paid run revealed--disallowedTools AskUserQuestionmakes autoplan bail at the Phase 1 premise gate via the plan-file fallback before any report-write path runs. The PTY harness can't drive autoplan through its review phases without auto-progression of AskUserQuestions, so static prompt-text verification carries the regression coverage.Test Coverage
Tests: 374 → 379 (+5 new static assertions).
Pre-Landing Review
No issues — diff is prompt text + tests only, no logic paths, no SQL, no LLM trust boundary changes, no error handling.
Eval Results
bun run eval:selectagainst the diff shows no prompt-builder pattern matches in the eval gating set (the patterns are Rails-styleapp/services/*_prompt_builder.rb, etc. — gstack's resolver-based prompts use a different gating system). Freebun testsuite exits 0 (379 pass, 0 fail intest/gen-skill-docs.test.ts). TargetedEVALS=1 EVALS_TIER=gate bun test test/skill-e2e-autoplan-auto-mode.test.tsran during development: 1 pass, 0 fail in 106s.Plan Completion
Plan at
~/.claude/plans/system-instruction-you-are-working-pure-swing.md. All items DONE or addressed:scripts/resolvers/review.ts— delete-then-append flow + verify stepgen:skill-docs --host all--disallowedTools AskUserQuestionTODOS
No matching TODOs in TODOS.md to mark complete (1655 lines, none referenced this bug).
Test plan
bun testsuite passes (exit 0)bun test test/gen-skill-docs.test.ts— 379/379 pass after fix🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.