Skip to content

v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335

Merged
garrytan merged 3 commits intomainfrom
garrytan/report-at-bottom
May 6, 2026
Merged

v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335
garrytan merged 3 commits intomainfrom
garrytan/report-at-bottom

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 6, 2026

Summary

Fixes the bug where ## GSTACK REVIEW REPORT would land mid-file in plan-mode plans whenever a stale copy from a prior /autoplan run was already there. The contradictory write rule in scripts/resolvers/review.ts ("replace it entirely (in place)" vs "always last section / move if mid-file") is collapsed into a single delete-then-append flow with explicit Read-tool verification.

Performance / behavior

  • Single rule eliminates the reconciliation step the agent had to perform; the new prompt has no internally inconsistent bullets to weigh.
  • Read-tool verification step runs after the append and self-corrects with one retry if the report didn't end up last.

Coverage

  • 5 new static assertions in test/gen-skill-docs.test.ts lock the prompt change against drift across all 4 plan-review SKILL.md files + the source resolver. Synthetic regression check (revert → 5 fails, restore → 5 passes) confirms the tests are bound to the prompt change.

Infrastructure

  • 6 generated SKILL.md files refreshed (plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review, codex, devex-review).
  • The test/skill-e2e-autoplan-auto-mode.test.ts reverts to its original AUQ-blocked-gate-surface shape after a paid run revealed --disallowedTools AskUserQuestion makes autoplan bail at the Phase 1 premise gate via the plan-file fallback before any report-write path runs. The PTY harness can't drive autoplan through its review phases without auto-progression of AskUserQuestions, so static prompt-text verification carries the regression coverage.

Test Coverage

[+] scripts/resolvers/review.ts (prompt text, no new code paths)
  └── delete-then-append flow              [TESTED] 5 static template assertions

[+] test/gen-skill-docs.test.ts (new tests)
  ├── 4 SKILL.md target tests              [★★★ TESTED] verify new + reject old
  └── 1 source resolver test               [★★★ TESTED] same checks vs review.ts source

[+] test/skill-e2e-autoplan-auto-mode.test.ts (revert to original)
  └── existing AUQ-blocked smoke test      [★★★ TESTED] passed in paid E2E run

COVERAGE: 100% — no new code paths introduced; only prompt text + tests.
SYNTHETIC REGRESSION: stash prompt fix → 5 tests fail; restore → 5 pass.

Tests: 374 → 379 (+5 new static assertions).

Pre-Landing Review

No issues — diff is prompt text + tests only, no logic paths, no SQL, no LLM trust boundary changes, no error handling.

Eval Results

bun run eval:select against the diff shows no prompt-builder pattern matches in the eval gating set (the patterns are Rails-style app/services/*_prompt_builder.rb, etc. — gstack's resolver-based prompts use a different gating system). Free bun test suite exits 0 (379 pass, 0 fail in test/gen-skill-docs.test.ts). Targeted EVALS=1 EVALS_TIER=gate bun test test/skill-e2e-autoplan-auto-mode.test.ts ran during development: 1 pass, 0 fail in 106s.

Plan Completion

Plan at ~/.claude/plans/system-instruction-you-are-working-pure-swing.md. All items DONE or addressed:

  • [DONE] Prompt fix in scripts/resolvers/review.ts — delete-then-append flow + verify step
  • [DONE] Generated SKILL.md regenerations — 6 files refreshed via gen:skill-docs --host all
  • [CHANGED] E2E regression test approach — pivoted to 5 static template assertions after paid E2E run revealed the harness can't reach the bug code path with --disallowedTools AskUserQuestion
  • [DONE] Synthetic regression check — verified by stash/restore cycle
  • [DONE] Free test suite — exit 0
  • [DROPPED] Optional SDK-harness ExitPlanMode-input check — marked optional in approved plan

TODOS

No matching TODOs in TODOS.md to mark complete (1655 lines, none referenced this bug).

Test plan

  • Free bun test suite passes (exit 0)
  • bun test test/gen-skill-docs.test.ts — 379/379 pass after fix
  • Synthetic regression: 5 new tests fail when prompt fix reverted
  • Targeted gate-tier autoplan E2E: 1/1 pass in 106s
  • In real plan-mode usage with /autoplan, the report should land at the bottom even when an older copy was mid-file

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 3 commits May 5, 2026 17:29
Replaces contradictory "replace it entirely" + "always last section / move
if mid-file" bullets in scripts/resolvers/review.ts with a single
delete-then-append rule. Adds Read-tool verification step so the agent
self-checks before continuing.

Affected SKILL.md files (regenerated): plan-ceo-review, plan-design-review,
plan-devex-review, plan-eng-review, codex, devex-review.
…plan E2E shape

5 new static tests in test/gen-skill-docs.test.ts (4 plan-review SKILL.md
files + 1 source resolver) verify the new prompt language is present and
the old contradictory bullets are absent. Synthetic regression check
confirmed all 5 fail when the prompt fix is reverted.

The autoplan E2E (skill-e2e-autoplan-auto-mode.test.ts) reverts to its
original AUQ-blocked-gate-surface shape. The mid-file regression scenario
the plan briefly proposed isn't reachable in the current PTY harness because
--disallowedTools AskUserQuestion makes autoplan bail at the Phase 1
premise gate before any review-write code path runs. Static prompt-text
verification covers the load-bearing change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

E2E Evals: ✅ PASS

14/14 tests passed | $2.68 total cost | 12 parallel runners

Suite Result Status Cost
e2e-design 2/2 $0.41
e2e-plan 6/6 $1.45
e2e-review 1/1 $0.35
llm-judge 3/3 $0.06
e2e-design 2/2 $0.41

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit 19e699a into main May 6, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant