Skip to content

v1.31.0.1 fix(parser): emit kind field before dualImpl in parsed Phase objects#1447

Open
anbangr wants to merge 219 commits into
garrytan:mainfrom
anbangr:feat/living-plan-step-visibility
Open

v1.31.0.1 fix(parser): emit kind field before dualImpl in parsed Phase objects#1447
anbangr wants to merge 219 commits into
garrytan:mainfrom
anbangr:feat/living-plan-step-visibility

Conversation

@anbangr
Copy link
Copy Markdown

@anbangr anbangr commented May 12, 2026

Summary

Parser fix: Reorders property emission in build/orchestrator/parser.ts so that kind appears before dualImpl in parsed Phase objects. This is a minor structural cleanup with no user-facing behavior change.

Other changes:

  • Added .llm-tmp/ to .gitignore
  • Bumped VERSION and package.json to 1.31.0.1
  • Added CHANGELOG entry for 1.31.0.1

Test Coverage

All new code paths have test coverage. Tests: 805 run, 802 pass, 3 pre-existing failures unrelated to this change.

Pre-Landing Review

No issues found.

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

No plan file detected.

TODOS

No TODO items completed in this PR.

Test plan

  • Build/orchestrator tests pass (802 pass, 3 pre-existing failures unrelated to this change)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

anbangr added 30 commits April 22, 2026 19:18
- Adds implement/SKILL.md.tmpl to execute plans in phases
- Updates GSTACK_PLAYBOOK.md to include the new workflow
…s review and ship, add implement reexamine mode
anbangr and others added 30 commits May 11, 2026 13:24
Remove the `--skip-sweep` flag and the unshipped feat/* sweep bullet
from the Startup Gates section and flags table. Aligns with the code
removal in 3e2b8b2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Adds mock configure.cm file to prevent jq from failing in Step M3.5 mock
…11-074503-fabe4c3f-4-e2e-test-touchfile-registration'
1. plan-selection (6 tests): `defaultActiveRunRegistryDir()` hardcoded
   `~/.gstack/build-state/active-runs` and ignored `GSTACK_BUILD_STATE_DIR`,
   causing 11 real active-run records to leak into unit tests and inflate
   candidate counts (turning expected "selected" into "ambiguous"). Fix: honour
   the env var consistently, the same way `state.ts` already does.

2. integration (3 tests): plan review subprocess called `codex` with
   `OPENAI_API_KEY` from the inherited `process.env`, triggering a real ~30s
   API call against the LLM. These tests exercise feature lifecycle, not plan
   review. Fix: add `--no-plan-review` to each CLI invocation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The branch's living-plan gate visibility feature was already subsumed by
origin/main (present as v1.28.0.0-fork). The previous merge at 39750ad2
incorrectly produced a tree identical to 1d79ecd, leaving the branch 5
commits behind origin/main with an empty diff.

This merge:
- Picks up the 5 newer origin/main commits (fe6212f..c1c4907)
- Restores the working tree to a clean state (fixture deletion resolved)
- Makes the branch reviewable again with a non-empty diff surface

Tests: bun run test:build-skill → 1103 pass, 0 fail
Revert CHANGELOG.md to origin/main to undo prose corruption
introduced by a markdown autoformatter:
- env vars like LC_ALL, AWS_* rendered as italic/broken
- regex allowlist ^[a-z0-9_-]+$ semantically flipped to ^[a-z0-9*-]+$
- code-block continuation de-dented out of list context

The branch's feature was already released as v1.28.0.0-fork;
no CHANGELOG edits were needed here.
…estSpec detection

Four improvements identified during code review of 3e2b8b2:

- Move `extractCoverageTarget` from cli.ts to sub-agents.ts (alongside
  parseCoveragePercent); re-export via import in cli.ts. Eliminates the
  circular-import risk when phase-runner.ts calls coverage functions.

- Fix decimal truncation in extractCoverageTarget: `(\d+)` only matched
  integers, silently returning 80 for targets like ≥90.5%. Changed to
  `([\d.]+)` + parseFloat.

- Fix `hasTestSpec` detection in buildGeminiTestSpecPrompt: was
  `phase.body.includes("#### Test Spec")` (fragile string match, false
  negative when body text differs). Now `phase.testSpecCheckboxLine !== -1`
  (parser already computes this — zero extra overhead).

- Wire coverage gate in RUN_TESTS handler: after GREEN tests pass and the
  phase has a test spec (`testSpecCheckboxLine !== -1`), call
  parseCoveragePercent(result.stdout, testCmd) and compare against
  extractCoverageTarget(phase.body). Below target → set coverageResult and
  route to test_fix_running. Unknown framework → log advisory, proceed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
Complete the coverage gate: `injectCoverageFlags(testCmd)` appends the
appropriate flag for the detected framework before the GREEN test run,
so `parseCoveragePercent` reliably finds coverage data in stdout even
when projects don't pre-configure coverage in their test script.

Framework → flag mapping:
  jest     → --coverage --coverageReporters text
  vitest   → --coverage
  bun test → --coverage
  pytest   → --cov --cov-report term-missing
  go test  → -cover
  unknown  → unchanged (advisory log, gate skips)

Injection is idempotent (no-op if flag already present) and only fires
when the phase has a test spec (testSpecCheckboxLine !== -1) — VERIFY_RED
and legacy phases use the bare test command unchanged.

11 unit tests added covering each framework, idempotency, and unknowns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`phase.kind !== "code" ? "" : ""` always evaluated to "" regardless
of the condition, and was silently filtered by .filter(Boolean).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restores the parser fix that adds kind: (p as any).kind ?? 'code' to
the phases.push call in finalize(). Also brings in the TDD-pin tests
from 8135900b that verify the default behavior.

This commit sits on top of the origin/main merge (a7a009e) which
restored injectCoverageFlags, buildKindInstructions, extractCoverageTarget,
and the --skip-ship exit-13 path.

Fixes P0, P2 from review report.

Refs: 8135900b, 587b058f
…p (Bug D1)

Two failing tests document the bug:
1. After CRITICAL verdict, state.planReview must be persisted with status
   "critical_exit_pending" — currently cli.ts does not persist anything
   before process.exit(3), so planReview stays undefined on disk.
2. On resume with the sentinel set, the plan-review gate must still fire —
   the current guard (!state.planReview) is false when planReview is truthy,
   so the gate is skipped after the sentinel is introduced.

Two GREEN tests confirm baseline behavior: APPROVE verdict suppresses the
gate; undefined planReview (first run) fires the gate.

Tests MUST fail until Feature 4 implementation lands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before this fix, a CRITICAL plan-review verdict caused process.exit(3)
without saving any sentinel to state. On resume, !state.planReview was
true → review ran again → CRITICAL again → infinite loop.

Fix:
1. Save state.planReview = { ...verdict, status: "critical_exit_pending" }
   before releaseLock + process.exit(3) so the sentinel survives on disk.
2. Widen the plan-review gate guard from !state.planReview to
   !state.planReview || state.planReview.status === "critical_exit_pending"
   so the gate re-fires on resume when the sentinel is present.

Tests: two new tests in phase-runner.test.ts cover both the sentinel
persistence and the widened gate; 90/90 passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g D2)

Introduces ExitError (errors.ts) — thrown instead of process.exit(N)
inside try/finally blocks so the finally clause runs cleanup before
the process terminates.

Changes:
- errors.ts: new ExitError class (instanceof Error, numeric code field)
- cli.ts: import ExitError; replace critical_exit process.exit(3) with
  throw new ExitError(3); update main().catch to call process.exit(err.code)
  when err instanceof ExitError
- phase-runner.test.ts: 5 new tests (ExitError shape, propagation through
  finally, default and custom messages); 95/95 passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ature 6)

applyResult() now populates phaseState.coverageResult when:
- action is RUN_TESTS
- tests are GREEN (status = "tests_green")
- extra.phaseBody is provided
- parseCoveragePercent() returns a non-null value for the stdout

Coverage below target emits an advisory warning but keeps status
"tests_green" — not blocking. The target defaults to 80 when no
"**Coverage target: ≥N%**" line appears in the phase body.

6 new tests in phase-runner.test.ts; 101/101 passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ics + test assertions

- Add errors.ts to MODULE_TEST_OWNERS in coverage-matrix.test.ts
- Fix analytics logActivity to emit "success" for exit code 13 (FINALIZATION_REQUIRED),
  which is a success state (pending ship), not a failure
- Fix integration test assertions: --skip-ship correctly exits 13, not 0, when
  features reach origin_verified (pre-existing test/impl mismatch)
…d [Phase 1.1]

RED phase TDD: 11 tests fail because the parser does not yet stamp kind: "code"
on emitted phases, and existing Phase literal construction sites have no kind
field (undefined fails the VALID_KINDS.includes runtime assertion).

11 tests pass immediately: direct Phase construction with explicit kind values,
and PhaseKind union membership checks (both already exist in types.ts).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add required kind: PhaseKind field to the parser factory init and to
every Phase literal construction site in tests/fixtures. This ensures
backward-compatible default of kind: "code" for all existing phases
while the type system enforces correctness going forward.

- parser.ts: stamp kind: "code" on every emitted Phase
- state.test.ts, cli.test.ts, phase-runner.test.ts,
  feature-review.test.ts, cli-guardrails.test.ts,
  phase-kind.test.ts: add kind: "code" to all helpers and inline literals
…tations

- Fix PHASE_HEADING regex to allow optional [kind] bracket between number and colon
- Add BODY_KIND_PATTERN for <!-- kind: X --> HTML comment fallback
- Add IMPL_LABELS_BY_KIND and REVIEW_LABELS_BY_KIND maps for all 5 PhaseKind values
- Parser now stamps kind from heading bracket (primary), body comment (fallback), or defaults to "code"
- Inline kind-comment detection ensures kind is set before checkbox processing
- Add implCheckboxRe/reviewCheckboxRe for kind-specific checkbox matching
- Add 16 new parser tests covering all bracket annotations, HTML fallback, checkbox recognition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add IMPL_MARKER_BY_KIND and REVIEW_MARKER_BY_KIND lookup tables
- Update flipPhaseCheckboxes signature to accept optional kind?: PhaseKind
- Derives implMarker/reviewMarker from kind ?? "code" (backward compat)
- Update reconcilePhaseCheckboxes to pass phase.kind
- Update both cli.ts call sites (lines ~3870, ~4282) to pass kind: phase.kind
- Add 9 kind-aware mutator tests covering all 5 kinds + error cases + backward compat

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ware tests

- Keep origin/main's comprehensive PhaseKind parsing tests
- Remove duplicate kind field and (p as any) cast in parser.ts
- Restore kind-aware parser, mutator markers, and CLI prompts from main
Replace stale test-timeout entry (already shipped at merge base) with
an honest description of what this branch ships over main.
…-visibility

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	test/gen-skill-docs.test.ts
#	test/helpers/touchfiles.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant