v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers) by garrytan · Pull Request #1335 · garrytan/gstack

garrytan · 2026-05-06T00:30:27Z

Summary

Fixes the bug where ## GSTACK REVIEW REPORT would land mid-file in plan-mode plans whenever a stale copy from a prior /autoplan run was already there. The contradictory write rule in scripts/resolvers/review.ts ("replace it entirely (in place)" vs "always last section / move if mid-file") is collapsed into a single delete-then-append flow with explicit Read-tool verification.

Performance / behavior

Single rule eliminates the reconciliation step the agent had to perform; the new prompt has no internally inconsistent bullets to weigh.
Read-tool verification step runs after the append and self-corrects with one retry if the report didn't end up last.

Coverage

5 new static assertions in test/gen-skill-docs.test.ts lock the prompt change against drift across all 4 plan-review SKILL.md files + the source resolver. Synthetic regression check (revert → 5 fails, restore → 5 passes) confirms the tests are bound to the prompt change.

Infrastructure

6 generated SKILL.md files refreshed (plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review, codex, devex-review).
The test/skill-e2e-autoplan-auto-mode.test.ts reverts to its original AUQ-blocked-gate-surface shape after a paid run revealed --disallowedTools AskUserQuestion makes autoplan bail at the Phase 1 premise gate via the plan-file fallback before any report-write path runs. The PTY harness can't drive autoplan through its review phases without auto-progression of AskUserQuestions, so static prompt-text verification carries the regression coverage.

Test Coverage

[+] scripts/resolvers/review.ts (prompt text, no new code paths)
  └── delete-then-append flow              [TESTED] 5 static template assertions

[+] test/gen-skill-docs.test.ts (new tests)
  ├── 4 SKILL.md target tests              [★★★ TESTED] verify new + reject old
  └── 1 source resolver test               [★★★ TESTED] same checks vs review.ts source

[+] test/skill-e2e-autoplan-auto-mode.test.ts (revert to original)
  └── existing AUQ-blocked smoke test      [★★★ TESTED] passed in paid E2E run

COVERAGE: 100% — no new code paths introduced; only prompt text + tests.
SYNTHETIC REGRESSION: stash prompt fix → 5 tests fail; restore → 5 pass.

Tests: 374 → 379 (+5 new static assertions).

Pre-Landing Review

No issues — diff is prompt text + tests only, no logic paths, no SQL, no LLM trust boundary changes, no error handling.

Eval Results

bun run eval:select against the diff shows no prompt-builder pattern matches in the eval gating set (the patterns are Rails-style app/services/*_prompt_builder.rb, etc. — gstack's resolver-based prompts use a different gating system). Free bun test suite exits 0 (379 pass, 0 fail in test/gen-skill-docs.test.ts). Targeted EVALS=1 EVALS_TIER=gate bun test test/skill-e2e-autoplan-auto-mode.test.ts ran during development: 1 pass, 0 fail in 106s.

Plan Completion

Plan at ~/.claude/plans/system-instruction-you-are-working-pure-swing.md. All items DONE or addressed:

[DONE] Prompt fix in scripts/resolvers/review.ts — delete-then-append flow + verify step
[DONE] Generated SKILL.md regenerations — 6 files refreshed via gen:skill-docs --host all
[CHANGED] E2E regression test approach — pivoted to 5 static template assertions after paid E2E run revealed the harness can't reach the bug code path with --disallowedTools AskUserQuestion
[DONE] Synthetic regression check — verified by stash/restore cycle
[DONE] Free test suite — exit 0
[DROPPED] Optional SDK-harness ExitPlanMode-input check — marked optional in approved plan

TODOS

No matching TODOs in TODOS.md to mark complete (1655 lines, none referenced this bug).

Test plan

Free bun test suite passes (exit 0)
bun test test/gen-skill-docs.test.ts — 379/379 pass after fix
Synthetic regression: 5 new tests fail when prompt fix reverted
Targeted gate-tier autoplan E2E: 1/1 pass in 106s
In real plan-mode usage with /autoplan, the report should land at the bottom even when an older copy was mid-file

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Replaces contradictory "replace it entirely" + "always last section / move if mid-file" bullets in scripts/resolvers/review.ts with a single delete-then-append rule. Adds Read-tool verification step so the agent self-checks before continuing. Affected SKILL.md files (regenerated): plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review, codex, devex-review.

…plan E2E shape 5 new static tests in test/gen-skill-docs.test.ts (4 plan-review SKILL.md files + 1 source resolver) verify the new prompt language is present and the old contradictory bullets are absent. Synthetic regression check confirmed all 5 fail when the prompt fix is reverted. The autoplan E2E (skill-e2e-autoplan-auto-mode.test.ts) reverts to its original AUQ-blocked-gate-surface shape. The mid-file regression scenario the plan briefly proposed isn't reachable in the current PTY harness because --disallowedTools AskUserQuestion makes autoplan bail at the Phase 1 premise gate before any review-write code path runs. Static prompt-text verification covers the load-bearing change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-06T00:41:20Z

E2E Evals: ✅ PASS

14/14 tests passed | $2.68 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-design	2/2	✅	$0.41
e2e-plan	6/6	✅	$1.45
e2e-review	1/1	✅	$0.35
llm-judge	3/3	✅	$0.06
e2e-design	2/2	✅	$0.41

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

garrytan and others added 3 commits May 5, 2026 17:29

chore: bump version and changelog (v1.26.4.0)

9ab25d6

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan merged commit 19e699a into main May 6, 2026
23 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335

v1.26.4.0 fix: GSTACK REVIEW REPORT delete-then-append (no more mid-file leftovers)#1335
garrytan merged 3 commits intomainfrom
garrytan/report-at-bottom

garrytan commented May 6, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 6, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

Eval Results

Plan Completion

TODOS

Test plan

Uh oh!

github-actions Bot commented May 6, 2026

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 6, 2026 •

edited by blacksmith-sh Bot

Loading