diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3144aa4eb9..bb5f51b111 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,60 @@
# Changelog
+## [1.26.3.0] - 2026-05-04
+
+## **`/review` and `/cso` stop reading the wrong branch's diff when a subagent flips the worktree.**
+
+The bug, observed three times in one session against a downstream project's PR
+reviewers: a long-running review skill renders findings against whatever branch
+the local worktree happens to be checked out to at the moment each `git diff`
+runs — not against the branch the user asked to review. Inside Agent SDK
+sessions where nested subagents share the worktree, a stray `git checkout`
+anywhere in the call tree silently re-targets every later diff command. The
+review then reports vulnerabilities, race conditions, or dead code on
+unrelated work, and the human reviewer has no way to know.
+
+The fix is mechanical and contained: a new `{{PR_DIFF_PIN}}` template fragment
+runs at Step 0.5 of each review skill, resolves `BASE_SHA` and `HEAD_SHA` to
+immutable commit identifiers via `gh pr view --json headRefOid` (in PR
+context) or `git rev-parse origin/` (out of PR context), and forces
+every subsequent `git diff`, `git log`, and `git show` to reference those
+SHAs by value. Commit SHAs are not symbolic refs — they don't move when the
+worktree flips. `/review` and `/cso` both adopt the new fragment in this
+release. A regression test (`test/pr-diff-pin-regression.test.ts`) builds a
+real two-branch fixture, simulates the worktree flip, and asserts that bare
+`git diff main` reports the wrong branch while SHA-pinned `git diff
+"$BASE_SHA" "$HEAD_SHA"` reports the right one — proving both the failure
+mode and the fix are real.
+
+This is not the upstream Claude Code `/security-review` built-in (which has
+the same class of bug — it uses `git diff origin/HEAD...` against working-tree
+HEAD — but lives in `cli.js` and is out of gstack's reach). The gstack `/cso`
+skill is now a strictly safer alternative for security audits run from inside
+multi-agent sessions.
+
+### What you can now do
+
+- **Run `/review` from a session that spawns subagents and trust the diff is the right one.** Step 0.5 prints the pinned `BASE_SHA` and `HEAD_SHA` so you can verify before findings start. If the SHAs ever look wrong, you'll see it before the review reports anything.
+- **Run `/cso --diff` from the same kind of multi-agent session** and get the same guarantee for the secrets-archaeology and OWASP phases.
+- **Catch a regression in code review.** `test/pr-diff-pin-regression.test.ts` runs in the free `bun test` tier (~6s) and will fail loudly if anyone re-introduces a bare `git diff origin/` into either skill template.
+
+### Itemized changes
+
+#### Added
+
+- `{{PR_DIFF_PIN}}` template resolver in `scripts/resolvers/utility.ts`. Generates a Step 0.5 block that resolves `BASE_SHA`/`HEAD_SHA` from PR metadata when available, from `origin/` and local `HEAD` otherwise, fetches the head commit so the SHA is local, and aborts with a descriptive error when SHAs cannot be resolved (refusing to proceed beats silently rendering against the wrong branch).
+- `test/pr-diff-pin-regression.test.ts` (8 tests, ~6s, free tier). Builds a fresh git repo with two divergent feature branches, exercises the bug end-to-end, and asserts the SHA-pinned form is stable across worktree flips. Includes template smell-tests that fail if `{{PR_DIFF_PIN}}` is removed or if a bare-ref diff command is reintroduced into `review/SKILL.md.tmpl` or `cso/SKILL.md.tmpl`.
+
+#### Changed
+
+- `review/SKILL.md.tmpl` Step 1 and Step 3 use `git diff "$BASE_SHA" "$HEAD_SHA"` instead of `git diff origin/`. The Step 3.4 workspace-aware queue check uses `git show "$HEAD_SHA:VERSION"` and `git show "$BASE_SHA:VERSION"` instead of `git show HEAD:VERSION` and `git show origin/$BASE_BRANCH:VERSION`. The Step 3.5 slop-scan reads against the pinned base. The skill's preamble explicitly names the `shared-checkout-branch-flip-during-review` failure mode the change closes.
+- `cso/SKILL.md.tmpl` adds `{{BASE_BRANCH_DETECT}}` and `{{PR_DIFF_PIN}}` to its preamble (it had neither). The Phase 2 secrets-archaeology `--diff` mode line replaces `git log -p ..HEAD` with `git log -p "$BASE_SHA..$HEAD_SHA"`.
+- `review/checklist.md` and `review/greptile-triage.md` reference the pinned SHAs and explain why `git diff origin/main` alone is unsafe inside multi-agent sessions.
+
+#### For contributors
+
+- Other skills with similar bare-ref diff patterns (`ship`, `codex`, `document-release`) are unchanged in this release. They are reachable from outside multi-agent reviews and the worktree-flip risk is lower there; a separate PR will sweep them once a per-skill verification is done. The new `{{PR_DIFF_PIN}}` resolver is reusable — adopting it elsewhere is one line in the `.tmpl` plus running `bun run gen:skill-docs`.
+
## [1.26.2.0] - 2026-05-03
## **`/plan-eng-review` always asks. Never silently writes findings to your plan first.**
diff --git a/VERSION b/VERSION
index 75334e9ded..068ff0d43d 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-1.26.2.0
+1.26.3.0
diff --git a/cso/SKILL.md b/cso/SKILL.md
index 44850ff755..bbf2da86a0 100644
--- a/cso/SKILL.md
+++ b/cso/SKILL.md
@@ -678,8 +678,182 @@ PLAN MODE EXCEPTION — always allowed (it's the plan file).
+## Step 0: Detect platform and base branch
+
+First, detect the git hosting platform from the remote URL:
+
+```bash
+git remote get-url origin 2>/dev/null
+```
+
+- If the URL contains "github.com" → platform is **GitHub**
+- If the URL contains "gitlab" → platform is **GitLab**
+- Otherwise, check CLI availability:
+ - `gh auth status 2>/dev/null` succeeds → platform is **GitHub** (covers GitHub Enterprise)
+ - `glab auth status 2>/dev/null` succeeds → platform is **GitLab** (covers self-hosted)
+ - Neither → **unknown** (use git-native commands only)
+
+Determine which branch this PR/MR targets, or the repo's default branch if no
+PR/MR exists. Use the result as "the base branch" in all subsequent steps.
+
+**If GitHub:**
+1. `gh pr view --json baseRefName -q .baseRefName` — if succeeds, use it
+2. `gh repo view --json defaultBranchRef -q .defaultBranchRef.name` — if succeeds, use it
+
+**If GitLab:**
+1. `glab mr view -F json 2>/dev/null` and extract the `target_branch` field — if succeeds, use it
+2. `glab repo view -F json 2>/dev/null` and extract the `default_branch` field — if succeeds, use it
+
+**Git-native fallback (if unknown platform, or CLI commands fail):**
+1. `git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'`
+2. If that fails: `git rev-parse --verify origin/main 2>/dev/null` → use `main`
+3. If that fails: `git rev-parse --verify origin/master 2>/dev/null` → use `master`
+
+If all fail, fall back to `main`.
+
+Print the detected base branch name. In every subsequent `git diff`, `git log`,
+`git fetch`, `git merge`, and PR/MR creation command, substitute the detected
+branch name wherever the instructions say "the base branch" or ``.
+
+---
+
+## Step 0.5: Pin diff context to immutable SHAs (anti-branch-flip)
+
+A long-running review skill is **not safe** to read git state through symbolic
+refs like `HEAD`, `origin/`, or `origin/HEAD`. Inside an Agent SDK
+session — and especially across nested subagents that share a worktree — the
+working tree, the symbolic-ref `HEAD`, and even the checked-out branch can
+flip mid-skill (e.g., another tool runs `git checkout` to inspect a file,
+then forgets to switch back). When that happens, every later `git diff`
+command silently re-renders against the new branch, and the review reports
+findings on the wrong code.
+
+The fix is to **resolve diff endpoints to immutable commit SHAs at the very
+start of the skill**, then use those SHAs in every subsequent `git diff`,
+`git log`, and `git show` invocation. SHAs do not move when the working
+tree flips.
+
+Run this **once, before any other diff/log step**:
+
+```bash
+# Resolve the PR (or branch) we're reviewing. Prefer explicit PR context.
+PR_NUMBER=$(gh pr view --json number -q .number 2>/dev/null || echo "")
+
+# REVIEW_DIRTY governs whether uncommitted local changes count as part of the
+# review. Default OFF in PR context (review committed work only); default ON
+# for local /review pre-PR (preserves the pre-fix behavior where dirty edits
+# were included in the diff). Override by exporting REVIEW_DIRTY=1 / 0 before
+# invoking the skill.
+if [ -z "${REVIEW_DIRTY+x}" ]; then
+ if [ -n "$PR_NUMBER" ]; then REVIEW_DIRTY=0; else REVIEW_DIRTY=1; fi
+fi
+
+if [ -n "$PR_NUMBER" ]; then
+ # In-PR review: prefer the PR's *own* recorded base/head SHAs over the
+ # local origin/ tracking ref. baseRefOid and headRefOid are
+ # immutable for the PR's current state — they are the SHAs GitHub renders
+ # against, regardless of local fetch staleness.
+ PR_META=$(gh pr view "$PR_NUMBER" --json baseRefName,headRefName,headRefOid,baseRefOid 2>/dev/null)
+ BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
+ HEAD_BRANCH=$(echo "$PR_META" | jq -r '.headRefName // empty')
+ HEAD_SHA=$(echo "$PR_META" | jq -r '.headRefOid // empty')
+ BASE_SHA=$(echo "$PR_META" | jq -r '.baseRefOid // empty')
+ # Fetch BOTH SHAs so they are present in the local object store. \
+ # Without this, `git diff "$BASE_SHA" "$HEAD_SHA"` errors out.
+ if [ -n "$HEAD_SHA" ]; then
+ git fetch origin "$HEAD_SHA" --quiet 2>/dev/null || \
+ git fetch origin "pull/$PR_NUMBER/head" --quiet 2>/dev/null || \
+ git fetch origin "$HEAD_BRANCH" --quiet 2>/dev/null || true
+ fi
+ if [ -n "$BASE_SHA" ]; then
+ git fetch origin "$BASE_SHA" --quiet 2>/dev/null || \
+ git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null || true
+ fi
+else
+ # No PR context: fall back to local-branch review against detected base branch.
+ # Reuse "the base branch" detected in Step 0; pin to its current origin SHA + local HEAD SHA.
+ HEAD_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
+ HEAD_SHA=$(git rev-parse HEAD 2>/dev/null || echo "")
+ if ! git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null; then
+ echo "WARNING: could not fetch origin/$BASE_BRANCH. Pinning to whatever local origin/$BASE_BRANCH points at — may be stale." >&2
+ fi
+ BASE_SHA=$(git rev-parse "origin/$BASE_BRANCH" 2>/dev/null || echo "")
+fi
+
+# Soft-validate the SHAs. If a skill REQUIRES a diff context (`/review`, `/cso --diff`),
+# it should add an explicit `[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` assertion before
+# its first diff/log step. Skills that operate without a diff (`/cso --infra`,
+# `/cso --supply-chain`, etc.) can proceed with empty SHAs and simply skip diff-mode
+# substeps. Returning early via `exit 1` here would break those scope-flag modes.
+if [ -z "$BASE_SHA" ] || [ -z "$HEAD_SHA" ]; then
+ echo "WARNING: could not resolve BASE_SHA / HEAD_SHA — diff-dependent steps will be skipped." >&2
+ echo " PR_NUMBER=$PR_NUMBER BASE_BRANCH=$BASE_BRANCH HEAD_BRANCH=$HEAD_BRANCH" >&2
+fi
+
+# If we DID resolve SHAs, verify both actually exist in the local object store.
+# (cat-file probe is a no-op when SHA is empty.)
+for _SHA in "$BASE_SHA" "$HEAD_SHA"; do
+ if [ -n "$_SHA" ] && ! git cat-file -e "$_SHA" 2>/dev/null; then
+ echo "WARNING: SHA $_SHA is not present in the local repo — re-run after `git fetch origin` if review covers committed changes." >&2
+ fi
+done
+
+echo "Pinned review context:"
+echo " PR: ${PR_NUMBER:-}"
+echo " Base branch: $BASE_BRANCH @ $BASE_SHA"
+echo " Head branch: $HEAD_BRANCH @ $HEAD_SHA"
+echo " Dirty edits: ${REVIEW_DIRTY:-0} (1 = include uncommitted working-tree changes in diff)"
+```
+
+**For the rest of this skill, use these pinned SHAs** in every diff/log
+command. Concretely:
+
+| Don't (working-tree dependent — bug) | Do (SHA-pinned — correct) |
+|--------------------------------------|------------------------------------------|
+| `git diff origin/` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git diff origin/...HEAD` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git diff ..HEAD` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git log origin/..HEAD` | `git log "$BASE_SHA..$HEAD_SHA"` |
+| `git diff --name-only origin/HEAD...` | `git diff --name-only "$BASE_SHA" "$HEAD_SHA"` |
+| `git show HEAD:VERSION` | `git show "$HEAD_SHA:VERSION"` |
+
+**Avoid `gh pr diff "$PR_NUMBER"`** even in PR-review context: that endpoint
+re-resolves `HEAD` and `BASE` server-side at every call, so a force-push of
+the PR head or a fast-forward of the PR base mid-review will silently change
+its output. Use the SHA-pinned local `git diff "$BASE_SHA" "$HEAD_SHA"`
+instead — it is immutable both against worktree flips AND against PR-state
+drift on the remote.
+
+If you genuinely need the PR-rendered diff (e.g., to compare against
+GitHub's UI), append `--patch` and a SHA boundary explicitly:
+`gh api "/repos///compare/$BASE_SHA...$HEAD_SHA"`.
+
+**Do not** use bare `HEAD`, `origin/HEAD`, or `origin/` (without
+`...$HEAD_SHA`) anywhere else in this skill. Even if those refs are correct
+right now, a later subagent may flip the worktree underneath you.
+
+This step is named `shared-checkout-branch-flip-during-review` in
+`CLAUDE.md` failure-mode tracking.
+
+---
+
# /cso — Chief Security Officer Audit (v2)
+**Diff context is SHA-pinned — see Step 0.5.** Whenever this skill scans a
+PR/branch diff (any phase running with `--diff`, plus Phase 2 secrets-archaeology
+diff mode), it must use `$BASE_SHA` and `$HEAD_SHA` resolved in Step 0.5, **not**
+bare `HEAD` / `origin/` / `..HEAD`. A subagent flipping the worktree
+mid-audit otherwise causes findings to render against the wrong code (named
+failure mode `shared-checkout-branch-flip-during-review`).
+
+When `/cso` runs WITHOUT `--diff` (e.g., `/cso --infra`, `/cso --supply-chain`,
+`/cso --owasp`), Step 0.5's WARN-on-missing-SHA behavior is acceptable: those
+scope flags don't read PR diffs, so empty `$BASE_SHA` / `$HEAD_SHA` is fine.
+Diff-mode substeps that DO need them (Phase 2 with `--diff`, etc.) must check
+`[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` and skip with a clear note when not
+resolvable. Do not silently fall back to bare `HEAD`/`origin/` — that
+re-introduces the bug this Step 0.5 block exists to close.
+
You are a **Chief Security Officer** who has led incident response on real breaches and testified before boards about security posture. You think like an attacker but report like a defender. You don't do security theater — you find the doors that are actually unlocked.
The real attack surface isn't your code — it's your dependencies. Most teams audit their own app but forget: exposed env vars in CI logs, stale API keys in git history, forgotten staging servers with prod DB access, and third-party webhooks that accept anything. Start there, not at the code level.
@@ -864,7 +1038,7 @@ done 2>/dev/null
**FP rules:** Placeholders ("your_", "changeme", "TODO") excluded. Test fixtures excluded unless same value in non-test code. Rotated secrets still flagged (they were exposed). `.env.local` in `.gitignore` is expected.
-**Diff mode:** Replace `git log -p --all` with `git log -p ..HEAD`.
+**Diff mode:** Replace `git log -p --all` with `git log -p "$BASE_SHA..$HEAD_SHA"` (SHAs pinned in Step 0.5 — never use bare `..HEAD` or `origin/..HEAD` because a subagent flipping the worktree would silently re-target the diff at the wrong branch).
### Phase 3: Dependency Supply Chain
diff --git a/cso/SKILL.md.tmpl b/cso/SKILL.md.tmpl
index 2f849ee006..a6256eb84d 100644
--- a/cso/SKILL.md.tmpl
+++ b/cso/SKILL.md.tmpl
@@ -35,8 +35,27 @@ triggers:
{{GBRAIN_CONTEXT_LOAD}}
+{{BASE_BRANCH_DETECT}}
+
+{{PR_DIFF_PIN}}
+
# /cso — Chief Security Officer Audit (v2)
+**Diff context is SHA-pinned — see Step 0.5.** Whenever this skill scans a
+PR/branch diff (any phase running with `--diff`, plus Phase 2 secrets-archaeology
+diff mode), it must use `$BASE_SHA` and `$HEAD_SHA` resolved in Step 0.5, **not**
+bare `HEAD` / `origin/` / `..HEAD`. A subagent flipping the worktree
+mid-audit otherwise causes findings to render against the wrong code (named
+failure mode `shared-checkout-branch-flip-during-review`).
+
+When `/cso` runs WITHOUT `--diff` (e.g., `/cso --infra`, `/cso --supply-chain`,
+`/cso --owasp`), Step 0.5's WARN-on-missing-SHA behavior is acceptable: those
+scope flags don't read PR diffs, so empty `$BASE_SHA` / `$HEAD_SHA` is fine.
+Diff-mode substeps that DO need them (Phase 2 with `--diff`, etc.) must check
+`[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` and skip with a clear note when not
+resolvable. Do not silently fall back to bare `HEAD`/`origin/` — that
+re-introduces the bug this Step 0.5 block exists to close.
+
You are a **Chief Security Officer** who has led incident response on real breaches and testified before boards about security posture. You think like an attacker but report like a defender. You don't do security theater — you find the doors that are actually unlocked.
The real attack surface isn't your code — it's your dependencies. Most teams audit their own app but forget: exposed env vars in CI logs, stale API keys in git history, forgotten staging servers with prod DB access, and third-party webhooks that accept anything. Start there, not at the code level.
@@ -185,7 +204,7 @@ done 2>/dev/null
**FP rules:** Placeholders ("your_", "changeme", "TODO") excluded. Test fixtures excluded unless same value in non-test code. Rotated secrets still flagged (they were exposed). `.env.local` in `.gitignore` is expected.
-**Diff mode:** Replace `git log -p --all` with `git log -p ..HEAD`.
+**Diff mode:** Replace `git log -p --all` with `git log -p "$BASE_SHA..$HEAD_SHA"` (SHAs pinned in Step 0.5 — never use bare `..HEAD` or `origin/..HEAD` because a subagent flipping the worktree would silently re-target the diff at the wrong branch).
### Phase 3: Dependency Supply Chain
diff --git a/package.json b/package.json
index 3fa1d2164f..380239b5b8 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
{
"name": "gstack",
- "version": "1.26.2.0",
+ "version": "1.26.3.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT",
"type": "module",
diff --git a/review/SKILL.md b/review/SKILL.md
index 112e3c53d9..c786a013e4 100644
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -732,17 +732,147 @@ branch name wherever the instructions say "the base branch" or ``.
---
+## Step 0.5: Pin diff context to immutable SHAs (anti-branch-flip)
+
+A long-running review skill is **not safe** to read git state through symbolic
+refs like `HEAD`, `origin/`, or `origin/HEAD`. Inside an Agent SDK
+session — and especially across nested subagents that share a worktree — the
+working tree, the symbolic-ref `HEAD`, and even the checked-out branch can
+flip mid-skill (e.g., another tool runs `git checkout` to inspect a file,
+then forgets to switch back). When that happens, every later `git diff`
+command silently re-renders against the new branch, and the review reports
+findings on the wrong code.
+
+The fix is to **resolve diff endpoints to immutable commit SHAs at the very
+start of the skill**, then use those SHAs in every subsequent `git diff`,
+`git log`, and `git show` invocation. SHAs do not move when the working
+tree flips.
+
+Run this **once, before any other diff/log step**:
+
+```bash
+# Resolve the PR (or branch) we're reviewing. Prefer explicit PR context.
+PR_NUMBER=$(gh pr view --json number -q .number 2>/dev/null || echo "")
+
+# REVIEW_DIRTY governs whether uncommitted local changes count as part of the
+# review. Default OFF in PR context (review committed work only); default ON
+# for local /review pre-PR (preserves the pre-fix behavior where dirty edits
+# were included in the diff). Override by exporting REVIEW_DIRTY=1 / 0 before
+# invoking the skill.
+if [ -z "${REVIEW_DIRTY+x}" ]; then
+ if [ -n "$PR_NUMBER" ]; then REVIEW_DIRTY=0; else REVIEW_DIRTY=1; fi
+fi
+
+if [ -n "$PR_NUMBER" ]; then
+ # In-PR review: prefer the PR's *own* recorded base/head SHAs over the
+ # local origin/ tracking ref. baseRefOid and headRefOid are
+ # immutable for the PR's current state — they are the SHAs GitHub renders
+ # against, regardless of local fetch staleness.
+ PR_META=$(gh pr view "$PR_NUMBER" --json baseRefName,headRefName,headRefOid,baseRefOid 2>/dev/null)
+ BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
+ HEAD_BRANCH=$(echo "$PR_META" | jq -r '.headRefName // empty')
+ HEAD_SHA=$(echo "$PR_META" | jq -r '.headRefOid // empty')
+ BASE_SHA=$(echo "$PR_META" | jq -r '.baseRefOid // empty')
+ # Fetch BOTH SHAs so they are present in the local object store. \
+ # Without this, `git diff "$BASE_SHA" "$HEAD_SHA"` errors out.
+ if [ -n "$HEAD_SHA" ]; then
+ git fetch origin "$HEAD_SHA" --quiet 2>/dev/null || \
+ git fetch origin "pull/$PR_NUMBER/head" --quiet 2>/dev/null || \
+ git fetch origin "$HEAD_BRANCH" --quiet 2>/dev/null || true
+ fi
+ if [ -n "$BASE_SHA" ]; then
+ git fetch origin "$BASE_SHA" --quiet 2>/dev/null || \
+ git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null || true
+ fi
+else
+ # No PR context: fall back to local-branch review against detected base branch.
+ # Reuse "the base branch" detected in Step 0; pin to its current origin SHA + local HEAD SHA.
+ HEAD_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
+ HEAD_SHA=$(git rev-parse HEAD 2>/dev/null || echo "")
+ if ! git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null; then
+ echo "WARNING: could not fetch origin/$BASE_BRANCH. Pinning to whatever local origin/$BASE_BRANCH points at — may be stale." >&2
+ fi
+ BASE_SHA=$(git rev-parse "origin/$BASE_BRANCH" 2>/dev/null || echo "")
+fi
+
+# Soft-validate the SHAs. If a skill REQUIRES a diff context (`/review`, `/cso --diff`),
+# it should add an explicit `[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` assertion before
+# its first diff/log step. Skills that operate without a diff (`/cso --infra`,
+# `/cso --supply-chain`, etc.) can proceed with empty SHAs and simply skip diff-mode
+# substeps. Returning early via `exit 1` here would break those scope-flag modes.
+if [ -z "$BASE_SHA" ] || [ -z "$HEAD_SHA" ]; then
+ echo "WARNING: could not resolve BASE_SHA / HEAD_SHA — diff-dependent steps will be skipped." >&2
+ echo " PR_NUMBER=$PR_NUMBER BASE_BRANCH=$BASE_BRANCH HEAD_BRANCH=$HEAD_BRANCH" >&2
+fi
+
+# If we DID resolve SHAs, verify both actually exist in the local object store.
+# (cat-file probe is a no-op when SHA is empty.)
+for _SHA in "$BASE_SHA" "$HEAD_SHA"; do
+ if [ -n "$_SHA" ] && ! git cat-file -e "$_SHA" 2>/dev/null; then
+ echo "WARNING: SHA $_SHA is not present in the local repo — re-run after `git fetch origin` if review covers committed changes." >&2
+ fi
+done
+
+echo "Pinned review context:"
+echo " PR: ${PR_NUMBER:-}"
+echo " Base branch: $BASE_BRANCH @ $BASE_SHA"
+echo " Head branch: $HEAD_BRANCH @ $HEAD_SHA"
+echo " Dirty edits: ${REVIEW_DIRTY:-0} (1 = include uncommitted working-tree changes in diff)"
+```
+
+**For the rest of this skill, use these pinned SHAs** in every diff/log
+command. Concretely:
+
+| Don't (working-tree dependent — bug) | Do (SHA-pinned — correct) |
+|--------------------------------------|------------------------------------------|
+| `git diff origin/` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git diff origin/...HEAD` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git diff ..HEAD` | `git diff "$BASE_SHA" "$HEAD_SHA"` |
+| `git log origin/..HEAD` | `git log "$BASE_SHA..$HEAD_SHA"` |
+| `git diff --name-only origin/HEAD...` | `git diff --name-only "$BASE_SHA" "$HEAD_SHA"` |
+| `git show HEAD:VERSION` | `git show "$HEAD_SHA:VERSION"` |
+
+**Avoid `gh pr diff "$PR_NUMBER"`** even in PR-review context: that endpoint
+re-resolves `HEAD` and `BASE` server-side at every call, so a force-push of
+the PR head or a fast-forward of the PR base mid-review will silently change
+its output. Use the SHA-pinned local `git diff "$BASE_SHA" "$HEAD_SHA"`
+instead — it is immutable both against worktree flips AND against PR-state
+drift on the remote.
+
+If you genuinely need the PR-rendered diff (e.g., to compare against
+GitHub's UI), append `--patch` and a SHA boundary explicitly:
+`gh api "/repos///compare/$BASE_SHA...$HEAD_SHA"`.
+
+**Do not** use bare `HEAD`, `origin/HEAD`, or `origin/` (without
+`...$HEAD_SHA`) anywhere else in this skill. Even if those refs are correct
+right now, a later subagent may flip the worktree underneath you.
+
+This step is named `shared-checkout-branch-flip-during-review` in
+`CLAUDE.md` failure-mode tracking.
+
+---
+
# Pre-Landing PR Review
You are running the `/review` workflow. Analyze the current branch's diff against the base branch for structural issues that tests don't catch.
+**Diff context is SHA-pinned — see Step 0.5.** Every `git diff`, `git log`, and `git show` command in this skill must use `$BASE_SHA` and `$HEAD_SHA` (resolved in Step 0.5), not bare `HEAD` / `origin/` / `origin/HEAD`. This is the named failure mode `shared-checkout-branch-flip-during-review`: a subagent flipping the worktree mid-review otherwise causes findings to render against the wrong branch.
+
+**Hard requirement:** `/review` cannot proceed without a resolved diff. Step 0.5 prints a WARNING (not an error) when SHAs cannot be resolved, so /cso's no-PR scope-flag modes still work; /review must check explicitly:
+
+```bash
+if [ -z "$BASE_SHA" ] || [ -z "$HEAD_SHA" ]; then
+ echo "ERROR: /review requires a resolvable BASE_SHA and HEAD_SHA. See Step 0.5 output." >&2
+ exit 1
+fi
+```
+
---
## Step 1: Check branch
-1. Run `git branch --show-current` to get the current branch.
-2. If on the base branch, output: **"Nothing to review — you're on the base branch or have no changes against it."** and stop.
-3. Run `git fetch origin --quiet && git diff origin/ --stat` to check if there's a diff. If no diff, output the same message and stop.
+1. The current local branch (informational only): `git branch --show-current`. The review itself uses `$HEAD_SHA` from Step 0.5, so this is **not** a correctness signal.
+2. If `$BASE_SHA` and `$HEAD_SHA` are equal, or `git diff --stat "$BASE_SHA" "$HEAD_SHA"` is empty, output: **"Nothing to review — head and base point at the same commit."** and stop.
---
@@ -751,10 +881,10 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
- Read commit messages (`git log origin/..HEAD --oneline`).
+ Read commit messages (`git log "$BASE_SHA..$HEAD_SHA" --oneline`).
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+3. Run `git diff --stat "$BASE_SHA" "$HEAD_SHA"` and compare the files changed against the stated intent.
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
@@ -839,7 +969,7 @@ For each item, note:
### Cross-Reference Against Diff
-Run `git diff origin/...HEAD` and `git log origin/..HEAD --oneline` to understand what was implemented.
+Run `git diff "$BASE_SHA" "$HEAD_SHA"` and `git log "$BASE_SHA..$HEAD_SHA" --oneline` to understand what was implemented.
For each extracted plan item, check the diff and classify:
@@ -880,7 +1010,7 @@ COMPLETION: 4/7 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED
When no plan file is detected, use these secondary intent sources:
-1. **Commit messages:** Run `git log origin/..HEAD --oneline`. Use judgment to extract real intent:
+1. **Commit messages:** Run `git log "$BASE_SHA..$HEAD_SHA" --oneline`. Use judgment to extract real intent:
- Commits with actionable verbs ("add", "implement", "fix", "create", "remove", "update") are intent signals
- Skip noise: "WIP", "tmp", "squash", "merge", "chore", "typo", "fixup"
- Extract the intent behind the commit, not the literal message
@@ -893,7 +1023,7 @@ When no plan file is detected, use these secondary intent sources:
For each PARTIAL or NOT DONE item, investigate WHY:
-1. Check `git log origin/..HEAD --oneline` for commits that suggest the work was started, attempted, or reverted
+1. Check `git log "$BASE_SHA..$HEAD_SHA" --oneline` for commits that suggest the work was started, attempted, or reverted
2. Read the relevant code to understand what was built instead
3. Determine the likely reason from this list:
- **Scope cut** — evidence of intentional removal (revert commit, removed TODO)
@@ -974,22 +1104,24 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, cl
## Step 3: Get the diff
-Fetch the latest base branch to avoid false positives from stale local state:
+The base branch was already fetched in Step 0.5; do not refetch it here (refetching would defeat the SHA-pinning if `origin/` advanced in the meantime).
-```bash
-git fetch origin --quiet
-```
+Run **one** of the following, depending on the value of `$REVIEW_DIRTY` resolved in Step 0.5:
+
+- **`REVIEW_DIRTY=0`** (default in PR context): `git diff "$BASE_SHA" "$HEAD_SHA"` — the committed diff between the pinned base and head SHAs. Immune to working-tree flips. This is the diff GitHub's PR view shows.
+- **`REVIEW_DIRTY=1`** (default for local pre-PR review): `git diff "$BASE_SHA"` PLUS `git diff "$HEAD_SHA"` — the first is the committed diff, the second is uncommitted working-tree changes on top of `$HEAD_SHA`. Concatenate the two. The uncommitted half is intrinsically not SHA-pinnable (the working tree is the working tree); your only protection there is to read it once and cache the output rather than re-running `git diff "$HEAD_SHA"` after each phase.
-Run `git diff origin/` to get the full diff. This includes both committed and uncommitted changes against the latest base branch.
+If you're not sure which mode you're in, the Step 0.5 output prints `Dirty edits: $REVIEW_DIRTY`.
## Step 3.4: Workspace-aware queue status (advisory)
Check whether this PR's claimed VERSION still points at a free slot in the queue. Advisory only — never blocks review; just informs the reviewer about landing-order risk.
```bash
-BRANCH_VERSION=$(git show HEAD:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
-BASE_BRANCH=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)
-BASE_VERSION=$(git show origin/$BASE_BRANCH:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
+# Use the SHAs pinned in Step 0.5 — bare HEAD / origin/ would drift if a
+# subagent flips the worktree between Step 0.5 and here.
+BRANCH_VERSION=$(git show "$HEAD_SHA:VERSION" 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
+BASE_VERSION=$(git show "$BASE_SHA:VERSION" 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
QUEUE_JSON=$(bun run bin/gstack-next-version \
--base "$BASE_BRANCH" \
--bump patch \
@@ -1007,10 +1139,11 @@ OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
## Step 3.5: Slop scan (advisory)
Run a slop scan on changed files to catch AI code quality issues (empty catches,
-redundant `return await`, overcomplicated abstractions):
+redundant `return await`, overcomplicated abstractions). Use the pinned base SHA
+so the slop diff doesn't drift if the worktree flips:
```bash
-bun run slop:diff origin/ 2>/dev/null || true
+bun run slop:diff "$BASE_SHA" 2>/dev/null || true
```
If findings are reported, include them in the review output as an informational
@@ -1116,8 +1249,8 @@ STACK=""
[ -f go.mod ] && STACK="${STACK}go "
[ -f Cargo.toml ] && STACK="${STACK}rust "
echo "STACK: ${STACK:-unknown}"
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_INS=$(git diff --stat "$BASE_SHA" "$HEAD_SHA" | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff --stat "$BASE_SHA" "$HEAD_SHA" | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
echo "DIFF_LINES: $DIFF_LINES"
# Detect test framework for specialist test stub generation
@@ -1191,7 +1324,7 @@ If learnings are found, include them: "Past learnings for this domain: {learning
4. Instructions:
"You are a specialist code reviewer. Read the checklist below, then run
-`git diff origin/` to get the full diff. Apply the checklist against the diff.
+`git diff "$BASE_SHA" "$HEAD_SHA"` to get the full diff. Apply the checklist against the diff.
For each finding, output a JSON object on its own line:
{\"severity\":\"CRITICAL|INFORMATIONAL\",\"confidence\":N,\"path\":\"file\",\"line\":N,\"category\":\"category\",\"summary\":\"description\",\"fix\":\"recommended fix\",\"fingerprint\":\"path:line:category\",\"specialist\":\"name\"}
@@ -1294,7 +1427,7 @@ The Red Team subagent receives:
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
who found the following issues: {merged findings summary}. Your job is to find what they
-MISSED. Read the checklist, run `git diff origin/`, and look for gaps.
+MISSED. Read the checklist, run `git diff "$BASE_SHA" "$HEAD_SHA"`, and look for gaps.
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
concerns, integration boundary issues, and failure modes that specialist checklists
don't cover."
@@ -1466,8 +1599,8 @@ Every diff gets adversarial review from both Claude and Codex. LOC is not a prox
**Detect diff size and tool availability:**
```bash
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_INS=$(git diff --stat "$BASE_SHA" "$HEAD_SHA" | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(git diff --stat "$BASE_SHA" "$HEAD_SHA" | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Legacy opt-out — only gates Codex passes, Claude always runs
@@ -1487,7 +1620,7 @@ If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
-"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
+"Read the diff for this branch with `git diff "$BASE_SHA" "$HEAD_SHA"`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: because ` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
@@ -1502,7 +1635,7 @@ If Codex is available AND `OLD_CFG` is NOT `disabled`:
```bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
+codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run \`git diff "\$BASE_SHA" "\$HEAD_SHA"\` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: because `. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
```
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl
index fada691125..9935d0e79b 100644
--- a/review/SKILL.md.tmpl
+++ b/review/SKILL.md.tmpl
@@ -28,17 +28,29 @@ triggers:
{{BASE_BRANCH_DETECT}}
+{{PR_DIFF_PIN}}
+
# Pre-Landing PR Review
You are running the `/review` workflow. Analyze the current branch's diff against the base branch for structural issues that tests don't catch.
+**Diff context is SHA-pinned — see Step 0.5.** Every `git diff`, `git log`, and `git show` command in this skill must use `$BASE_SHA` and `$HEAD_SHA` (resolved in Step 0.5), not bare `HEAD` / `origin/` / `origin/HEAD`. This is the named failure mode `shared-checkout-branch-flip-during-review`: a subagent flipping the worktree mid-review otherwise causes findings to render against the wrong branch.
+
+**Hard requirement:** `/review` cannot proceed without a resolved diff. Step 0.5 prints a WARNING (not an error) when SHAs cannot be resolved, so /cso's no-PR scope-flag modes still work; /review must check explicitly:
+
+```bash
+if [ -z "$BASE_SHA" ] || [ -z "$HEAD_SHA" ]; then
+ echo "ERROR: /review requires a resolvable BASE_SHA and HEAD_SHA. See Step 0.5 output." >&2
+ exit 1
+fi
+```
+
---
## Step 1: Check branch
-1. Run `git branch --show-current` to get the current branch.
-2. If on the base branch, output: **"Nothing to review — you're on the base branch or have no changes against it."** and stop.
-3. Run `git fetch origin --quiet && git diff origin/ --stat` to check if there's a diff. If no diff, output the same message and stop.
+1. The current local branch (informational only): `git branch --show-current`. The review itself uses `$HEAD_SHA` from Step 0.5, so this is **not** a correctness signal.
+2. If `$BASE_SHA` and `$HEAD_SHA` are equal, or `git diff --stat "$BASE_SHA" "$HEAD_SHA"` is empty, output: **"Nothing to review — head and base point at the same commit."** and stop.
---
@@ -66,22 +78,24 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, cl
## Step 3: Get the diff
-Fetch the latest base branch to avoid false positives from stale local state:
+The base branch was already fetched in Step 0.5; do not refetch it here (refetching would defeat the SHA-pinning if `origin/` advanced in the meantime).
-```bash
-git fetch origin --quiet
-```
+Run **one** of the following, depending on the value of `$REVIEW_DIRTY` resolved in Step 0.5:
+
+- **`REVIEW_DIRTY=0`** (default in PR context): `git diff "$BASE_SHA" "$HEAD_SHA"` — the committed diff between the pinned base and head SHAs. Immune to working-tree flips. This is the diff GitHub's PR view shows.
+- **`REVIEW_DIRTY=1`** (default for local pre-PR review): `git diff "$BASE_SHA"` PLUS `git diff "$HEAD_SHA"` — the first is the committed diff, the second is uncommitted working-tree changes on top of `$HEAD_SHA`. Concatenate the two. The uncommitted half is intrinsically not SHA-pinnable (the working tree is the working tree); your only protection there is to read it once and cache the output rather than re-running `git diff "$HEAD_SHA"` after each phase.
-Run `git diff origin/` to get the full diff. This includes both committed and uncommitted changes against the latest base branch.
+If you're not sure which mode you're in, the Step 0.5 output prints `Dirty edits: $REVIEW_DIRTY`.
## Step 3.4: Workspace-aware queue status (advisory)
Check whether this PR's claimed VERSION still points at a free slot in the queue. Advisory only — never blocks review; just informs the reviewer about landing-order risk.
```bash
-BRANCH_VERSION=$(git show HEAD:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
-BASE_BRANCH=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)
-BASE_VERSION=$(git show origin/$BASE_BRANCH:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
+# Use the SHAs pinned in Step 0.5 — bare HEAD / origin/ would drift if a
+# subagent flips the worktree between Step 0.5 and here.
+BRANCH_VERSION=$(git show "$HEAD_SHA:VERSION" 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
+BASE_VERSION=$(git show "$BASE_SHA:VERSION" 2>/dev/null | tr -d '\r\n[:space:]' || echo "")
QUEUE_JSON=$(bun run bin/gstack-next-version \
--base "$BASE_BRANCH" \
--bump patch \
@@ -99,10 +113,11 @@ OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
## Step 3.5: Slop scan (advisory)
Run a slop scan on changed files to catch AI code quality issues (empty catches,
-redundant `return await`, overcomplicated abstractions):
+redundant `return await`, overcomplicated abstractions). Use the pinned base SHA
+so the slop diff doesn't drift if the worktree flips:
```bash
-bun run slop:diff origin/ 2>/dev/null || true
+bun run slop:diff "$BASE_SHA" 2>/dev/null || true
```
If findings are reported, include them in the review output as an informational
diff --git a/review/checklist.md b/review/checklist.md
index 16aa111bb0..ace80f3418 100644
--- a/review/checklist.md
+++ b/review/checklist.md
@@ -2,7 +2,9 @@
## Instructions
-Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
+Review the `git diff "$BASE_SHA" "$HEAD_SHA"` output (SHAs are pinned in Step 0.5 of the review skill — see SKILL.md) for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
+
+**Why pinned SHAs:** Bare `git diff origin/main` re-renders against the working tree, which a nested subagent can flip mid-review (named failure mode `shared-checkout-branch-flip-during-review`). The pinned SHAs are immutable across worktree flips.
**Two-pass review:**
- **Pass 1 (CRITICAL):** Run SQL & Data Safety, Race Conditions, LLM Output Trust Boundary, Shell Injection, and Enum Completeness first. Highest severity.
diff --git a/review/greptile-triage.md b/review/greptile-triage.md
index 3cb6e8d597..b8428034ac 100644
--- a/review/greptile-triage.md
+++ b/review/greptile-triage.md
@@ -64,7 +64,7 @@ For each non-suppressed comment:
1. **Line-level comments:** Read the file at the indicated `path:line` and surrounding context (±10 lines)
2. **Top-level comments:** Read the full comment body
-3. Cross-reference the comment against the full diff (`git diff origin/main`) and the review checklist
+3. Cross-reference the comment against the full diff (`git diff "$BASE_SHA" "$HEAD_SHA"` — SHAs pinned in the review skill's Step 0.5; bare `git diff origin/main` would drift if a subagent flips the worktree mid-review) and the review checklist
4. Classify:
- **VALID & ACTIONABLE** — a real bug, race condition, security issue, or correctness problem that exists in the current code
- **VALID BUT ALREADY FIXED** — a real issue that was addressed in a subsequent commit on the branch. Identify the fixing commit SHA.
diff --git a/scripts/resolvers/index.ts b/scripts/resolvers/index.ts
index a3553d9d52..6764b1d3ee 100644
--- a/scripts/resolvers/index.ts
+++ b/scripts/resolvers/index.ts
@@ -12,7 +12,7 @@ import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup }
import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateTasteProfile, generateUXPrinciples } from './design';
import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
-import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
+import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generatePrDiffPin, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
import { generateLearningsSearch, generateLearningsLog } from './learnings';
import { generateConfidenceCalibration } from './confidence';
import { generateInvokeSkill } from './composition';
@@ -31,6 +31,7 @@ export const RESOLVERS: Record = {
PREAMBLE: generatePreamble,
BROWSE_SETUP: generateBrowseSetup,
BASE_BRANCH_DETECT: generateBaseBranchDetect,
+ PR_DIFF_PIN: generatePrDiffPin,
QA_METHODOLOGY: generateQAMethodology,
DESIGN_METHODOLOGY: generateDesignMethodology,
DESIGN_HARD_RULES: generateDesignHardRules,
diff --git a/scripts/resolvers/review-army.ts b/scripts/resolvers/review-army.ts
index 516ce3c8d4..9356125704 100644
--- a/scripts/resolvers/review-army.ts
+++ b/scripts/resolvers/review-army.ts
@@ -13,9 +13,14 @@ import type { TemplateContext } from './types';
function generateSpecialistSelection(ctx: TemplateContext): string {
const isShip = ctx.skillName === 'ship';
+ const isReview = ctx.skillName === 'review';
const stepSel = isShip ? '9.1' : '4.5';
const stepMerge = isShip ? '9.2' : '4.6';
const nextStep = isShip ? 'the Fix-First flow (item 4)' : 'Step 5';
+ // /review pins SHAs in Step 0.5 (PR_DIFF_PIN); /ship doesn't.
+ const diffStat = isReview
+ ? `git diff --stat "$BASE_SHA" "$HEAD_SHA"`
+ : `git diff origin/ --stat`;
return `## Step ${stepSel}: Review Army — Specialist Dispatch
### Detect stack and scope
@@ -30,8 +35,8 @@ STACK=""
[ -f go.mod ] && STACK="\${STACK}go "
[ -f Cargo.toml ] && STACK="\${STACK}rust "
echo "STACK: \${STACK:-unknown}"
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_INS=$(${diffStat} | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(${diffStat} | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
echo "DIFF_LINES: $DIFF_LINES"
# Detect test framework for specialist test stub generation
@@ -82,6 +87,7 @@ Note which specialists were selected, gated, and skipped. Print the selection:
}
function generateSpecialistDispatch(ctx: TemplateContext): string {
+ const isReview = ctx.skillName === 'review';
return `### Dispatch specialists in parallel
For each selected specialist, launch an independent subagent via the Agent tool.
@@ -105,7 +111,7 @@ If learnings are found, include them: "Past learnings for this domain: {learning
4. Instructions:
"You are a specialist code reviewer. Read the checklist below, then run
-\`git diff origin/\` to get the full diff. Apply the checklist against the diff.
+${isReview ? '`git diff "$BASE_SHA" "$HEAD_SHA"`' : '`git diff origin/`'} to get the full diff. Apply the checklist against the diff.
For each finding, output a JSON object on its own line:
{\\"severity\\":\\"CRITICAL|INFORMATIONAL\\",\\"confidence\\":N,\\"path\\":\\"file\\",\\"line\\":N,\\"category\\":\\"category\\",\\"summary\\":\\"description\\",\\"fix\\":\\"recommended fix\\",\\"fingerprint\\":\\"path:line:category\\",\\"specialist\\":\\"name\\"}
@@ -202,8 +208,12 @@ Remember these stats — you will need them for the review-log entry in Step 5.8
function generateRedTeam(ctx: TemplateContext): string {
const isShip = ctx.skillName === 'ship';
+ const isReview = ctx.skillName === 'review';
const stepMerge = isShip ? '9.2' : '4.6';
const fixFirstRef = isShip ? 'the Fix-First flow (item 4)' : 'Step 5 Fix-First';
+ const diffCmd = isReview
+ ? '`git diff "$BASE_SHA" "$HEAD_SHA"`'
+ : '`git diff origin/`';
return `### Red Team dispatch (conditional)
**Activation:** Only if DIFF_LINES > 200 OR any specialist produced a CRITICAL finding.
@@ -217,7 +227,7 @@ The Red Team subagent receives:
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
who found the following issues: {merged findings summary}. Your job is to find what they
-MISSED. Read the checklist, run \`git diff origin/\`, and look for gaps.
+MISSED. Read the checklist, run ${diffCmd}, and look for gaps.
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
concerns, integration boundary issues, and failure modes that specialist checklists
don't cover."
diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts
index 53c7b08dab..dbce11caa1 100644
--- a/scripts/resolvers/review.ts
+++ b/scripts/resolvers/review.ts
@@ -368,17 +368,27 @@ If A: revise the premise and note the revision. If B: proceed (and note that the
export function generateScopeDrift(ctx: TemplateContext): string {
const isShip = ctx.skillName === 'ship';
+ const isReview = ctx.skillName === 'review';
const stepNum = isShip ? '8.2' : '1.5';
+ // /review pins BASE_SHA/HEAD_SHA in Step 0.5 (PR_DIFF_PIN). /ship doesn't.
+ // Use SHA-pinned commands inside /review; bare refs elsewhere for back-compat.
+ const logRangeCmd = isReview
+ ? '`git log "$BASE_SHA..$HEAD_SHA" --oneline`'
+ : '`git log origin/..HEAD --oneline`';
+ const diffStatCmd = isReview
+ ? '`git diff --stat "$BASE_SHA" "$HEAD_SHA"`'
+ : '`git diff origin/...HEAD --stat`';
+
return `## Step ${stepNum}: Scope Drift Detection
Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
1. Read \`TODOS.md\` (if it exists). Read PR description (\`gh pr view --json body --jq .body 2>/dev/null || true\`).
- Read commit messages (\`git log origin/..HEAD --oneline\`).
+ Read commit messages (${logRangeCmd}).
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run \`git diff origin/...HEAD --stat\` and compare the files changed against the stated intent.
+3. Run ${diffStatCmd} and compare the files changed against the stated intent.
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
@@ -413,8 +423,20 @@ export function generateAdversarialStep(ctx: TemplateContext): string {
if (ctx.host === 'codex') return '';
const isShip = ctx.skillName === 'ship';
+ const isReview = ctx.skillName === 'review';
const stepNum = isShip ? '11' : '5.7';
+ // /review pins SHAs (PR_DIFF_PIN). /ship doesn't (yet).
+ const diffStat = isReview
+ ? `git diff --stat "$BASE_SHA" "$HEAD_SHA"`
+ : `git diff origin/ --stat`;
+ const subagentDiff = isReview
+ ? `git diff "$BASE_SHA" "$HEAD_SHA"`
+ : `git diff origin/`;
+ const codexDiffPhrase = isReview
+ ? `Run \\\`git diff "\\$BASE_SHA" "\\$HEAD_SHA"\\\` to see the diff`
+ : `Run git diff origin/ to see the diff`;
+
return `## Step ${stepNum}: Adversarial review (always-on)
Every diff gets adversarial review from both Claude and Codex. LOC is not a proxy for risk — a 5-line auth change can be critical.
@@ -422,8 +444,8 @@ Every diff gets adversarial review from both Claude and Codex. LOC is not a prox
**Detect diff size and tool availability:**
\`\`\`bash
-DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
-DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
+DIFF_INS=$(${diffStat} | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
+DIFF_DEL=$(${diffStat} | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Legacy opt-out — only gates Codex passes, Claude always runs
@@ -443,7 +465,7 @@ If \`OLD_CFG\` is \`disabled\`: skip Codex passes only. Claude adversarial subag
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
-"Read the diff for this branch with \`git diff origin/\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format \`Recommendation: because \` — examples: \`Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s\` or \`Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production\`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
+"Read the diff for this branch with \`${subagentDiff}\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format \`Recommendation: because \` — examples: \`Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s\` or \`Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production\`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
@@ -458,7 +480,7 @@ If Codex is available AND \`OLD_CFG\` is NOT \`disabled\`:
\`\`\`bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "${CODEX_BOUNDARY}Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format \`Recommendation: because \`. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
+codex exec "${CODEX_BOUNDARY}Review the changes on this branch against the base branch. ${codexDiffPhrase}. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format \`Recommendation: because \`. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
\`\`\`
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
@@ -723,6 +745,17 @@ type PlanCompletionMode = 'ship' | 'review';
function generatePlanCompletionAuditInner(mode: PlanCompletionMode): string {
const sections: string[] = [];
+ // /review pins SHAs in Step 0.5 (PR_DIFF_PIN). /ship doesn't (yet).
+ // Switch git command phrasing accordingly so the wrong-branch failure
+ // mode (`shared-checkout-branch-flip-during-review`) is closed for /review.
+ const isReview = mode === 'review';
+ const diffRangeCmd = isReview
+ ? '`git diff "$BASE_SHA" "$HEAD_SHA"`'
+ : '`git diff origin/...HEAD`';
+ const logRangeCmd = isReview
+ ? '`git log "$BASE_SHA..$HEAD_SHA" --oneline`'
+ : '`git log origin/..HEAD --oneline`';
+
// ── Plan file discovery (shared) ──
sections.push(generatePlanFileDiscovery());
@@ -758,7 +791,7 @@ For each item, note:
sections.push(`
### Cross-Reference Against Diff
-Run \`git diff origin/...HEAD\` and \`git log origin/..HEAD --oneline\` to understand what was implemented.
+Run ${diffRangeCmd} and ${logRangeCmd} to understand what was implemented.
For each extracted plan item, check the diff and classify:
@@ -828,7 +861,7 @@ After producing the completion checklist:
When no plan file is detected, use these secondary intent sources:
-1. **Commit messages:** Run \`git log origin/..HEAD --oneline\`. Use judgment to extract real intent:
+1. **Commit messages:** Run ${logRangeCmd}. Use judgment to extract real intent:
- Commits with actionable verbs ("add", "implement", "fix", "create", "remove", "update") are intent signals
- Skip noise: "WIP", "tmp", "squash", "merge", "chore", "typo", "fixup"
- Extract the intent behind the commit, not the literal message
@@ -841,7 +874,7 @@ When no plan file is detected, use these secondary intent sources:
For each PARTIAL or NOT DONE item, investigate WHY:
-1. Check \`git log origin/..HEAD --oneline\` for commits that suggest the work was started, attempted, or reverted
+1. Check ${logRangeCmd} for commits that suggest the work was started, attempted, or reverted
2. Read the relevant code to understand what was built instead
3. Determine the likely reason from this list:
- **Scope cut** — evidence of intentional removal (revert commit, removed TODO)
diff --git a/scripts/resolvers/utility.ts b/scripts/resolvers/utility.ts
index 3d2e368a29..0298329b8b 100644
--- a/scripts/resolvers/utility.ts
+++ b/scripts/resolvers/utility.ts
@@ -49,6 +49,128 @@ branch name wherever the instructions say "the base branch" or \`\`.
---`;
}
+export function generatePrDiffPin(_ctx: TemplateContext): string {
+ return `## Step 0.5: Pin diff context to immutable SHAs (anti-branch-flip)
+
+A long-running review skill is **not safe** to read git state through symbolic
+refs like \`HEAD\`, \`origin/\`, or \`origin/HEAD\`. Inside an Agent SDK
+session — and especially across nested subagents that share a worktree — the
+working tree, the symbolic-ref \`HEAD\`, and even the checked-out branch can
+flip mid-skill (e.g., another tool runs \`git checkout\` to inspect a file,
+then forgets to switch back). When that happens, every later \`git diff\`
+command silently re-renders against the new branch, and the review reports
+findings on the wrong code.
+
+The fix is to **resolve diff endpoints to immutable commit SHAs at the very
+start of the skill**, then use those SHAs in every subsequent \`git diff\`,
+\`git log\`, and \`git show\` invocation. SHAs do not move when the working
+tree flips.
+
+Run this **once, before any other diff/log step**:
+
+\`\`\`bash
+# Resolve the PR (or branch) we're reviewing. Prefer explicit PR context.
+PR_NUMBER=$(gh pr view --json number -q .number 2>/dev/null || echo "")
+
+# REVIEW_DIRTY governs whether uncommitted local changes count as part of the
+# review. Default OFF in PR context (review committed work only); default ON
+# for local /review pre-PR (preserves the pre-fix behavior where dirty edits
+# were included in the diff). Override by exporting REVIEW_DIRTY=1 / 0 before
+# invoking the skill.
+if [ -z "\${REVIEW_DIRTY+x}" ]; then
+ if [ -n "$PR_NUMBER" ]; then REVIEW_DIRTY=0; else REVIEW_DIRTY=1; fi
+fi
+
+if [ -n "$PR_NUMBER" ]; then
+ # In-PR review: prefer the PR's *own* recorded base/head SHAs over the
+ # local origin/ tracking ref. baseRefOid and headRefOid are
+ # immutable for the PR's current state — they are the SHAs GitHub renders
+ # against, regardless of local fetch staleness.
+ PR_META=$(gh pr view "$PR_NUMBER" --json baseRefName,headRefName,headRefOid,baseRefOid 2>/dev/null)
+ BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
+ HEAD_BRANCH=$(echo "$PR_META" | jq -r '.headRefName // empty')
+ HEAD_SHA=$(echo "$PR_META" | jq -r '.headRefOid // empty')
+ BASE_SHA=$(echo "$PR_META" | jq -r '.baseRefOid // empty')
+ # Fetch BOTH SHAs so they are present in the local object store. \\
+ # Without this, \`git diff "$BASE_SHA" "$HEAD_SHA"\` errors out.
+ if [ -n "$HEAD_SHA" ]; then
+ git fetch origin "$HEAD_SHA" --quiet 2>/dev/null || \\
+ git fetch origin "pull/$PR_NUMBER/head" --quiet 2>/dev/null || \\
+ git fetch origin "$HEAD_BRANCH" --quiet 2>/dev/null || true
+ fi
+ if [ -n "$BASE_SHA" ]; then
+ git fetch origin "$BASE_SHA" --quiet 2>/dev/null || \\
+ git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null || true
+ fi
+else
+ # No PR context: fall back to local-branch review against detected base branch.
+ # Reuse \"the base branch\" detected in Step 0; pin to its current origin SHA + local HEAD SHA.
+ HEAD_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
+ HEAD_SHA=$(git rev-parse HEAD 2>/dev/null || echo "")
+ if ! git fetch origin "$BASE_BRANCH" --quiet 2>/dev/null; then
+ echo "WARNING: could not fetch origin/$BASE_BRANCH. Pinning to whatever local origin/$BASE_BRANCH points at — may be stale." >&2
+ fi
+ BASE_SHA=$(git rev-parse "origin/$BASE_BRANCH" 2>/dev/null || echo "")
+fi
+
+# Soft-validate the SHAs. If a skill REQUIRES a diff context (\`/review\`, \`/cso --diff\`),
+# it should add an explicit \`[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]\` assertion before
+# its first diff/log step. Skills that operate without a diff (\`/cso --infra\`,
+# \`/cso --supply-chain\`, etc.) can proceed with empty SHAs and simply skip diff-mode
+# substeps. Returning early via \`exit 1\` here would break those scope-flag modes.
+if [ -z "$BASE_SHA" ] || [ -z "$HEAD_SHA" ]; then
+ echo "WARNING: could not resolve BASE_SHA / HEAD_SHA — diff-dependent steps will be skipped." >&2
+ echo " PR_NUMBER=$PR_NUMBER BASE_BRANCH=$BASE_BRANCH HEAD_BRANCH=$HEAD_BRANCH" >&2
+fi
+
+# If we DID resolve SHAs, verify both actually exist in the local object store.
+# (cat-file probe is a no-op when SHA is empty.)
+for _SHA in "$BASE_SHA" "$HEAD_SHA"; do
+ if [ -n "$_SHA" ] && ! git cat-file -e "$_SHA" 2>/dev/null; then
+ echo "WARNING: SHA $_SHA is not present in the local repo — re-run after \`git fetch origin\` if review covers committed changes." >&2
+ fi
+done
+
+echo "Pinned review context:"
+echo " PR: \${PR_NUMBER:-}"
+echo " Base branch: $BASE_BRANCH @ $BASE_SHA"
+echo " Head branch: $HEAD_BRANCH @ $HEAD_SHA"
+echo " Dirty edits: \${REVIEW_DIRTY:-0} (1 = include uncommitted working-tree changes in diff)"
+\`\`\`
+
+**For the rest of this skill, use these pinned SHAs** in every diff/log
+command. Concretely:
+
+| Don't (working-tree dependent — bug) | Do (SHA-pinned — correct) |
+|--------------------------------------|------------------------------------------|
+| \`git diff origin/\` | \`git diff "$BASE_SHA" "$HEAD_SHA"\` |
+| \`git diff origin/...HEAD\` | \`git diff "$BASE_SHA" "$HEAD_SHA"\` |
+| \`git diff ..HEAD\` | \`git diff "$BASE_SHA" "$HEAD_SHA"\` |
+| \`git log origin/..HEAD\` | \`git log "$BASE_SHA..$HEAD_SHA"\` |
+| \`git diff --name-only origin/HEAD...\` | \`git diff --name-only "$BASE_SHA" "$HEAD_SHA"\` |
+| \`git show HEAD:VERSION\` | \`git show "$HEAD_SHA:VERSION"\` |
+
+**Avoid \`gh pr diff "$PR_NUMBER"\`** even in PR-review context: that endpoint
+re-resolves \`HEAD\` and \`BASE\` server-side at every call, so a force-push of
+the PR head or a fast-forward of the PR base mid-review will silently change
+its output. Use the SHA-pinned local \`git diff "$BASE_SHA" "$HEAD_SHA"\`
+instead — it is immutable both against worktree flips AND against PR-state
+drift on the remote.
+
+If you genuinely need the PR-rendered diff (e.g., to compare against
+GitHub's UI), append \`--patch\` and a SHA boundary explicitly:
+\`gh api "/repos///compare/$BASE_SHA...$HEAD_SHA"\`.
+
+**Do not** use bare \`HEAD\`, \`origin/HEAD\`, or \`origin/\` (without
+\`...$HEAD_SHA\`) anywhere else in this skill. Even if those refs are correct
+right now, a later subagent may flip the worktree underneath you.
+
+This step is named \`shared-checkout-branch-flip-during-review\` in
+\`CLAUDE.md\` failure-mode tracking.
+
+---`;
+}
+
export function generateDeployBootstrap(_ctx: TemplateContext): string {
return `\`\`\`bash
# Check for persisted deploy config in CLAUDE.md
diff --git a/test/pr-diff-pin-regression.test.ts b/test/pr-diff-pin-regression.test.ts
new file mode 100644
index 0000000000..37aa5c9ccd
--- /dev/null
+++ b/test/pr-diff-pin-regression.test.ts
@@ -0,0 +1,348 @@
+/**
+ * Regression test for the `shared-checkout-branch-flip-during-review` failure mode.
+ *
+ * Empirical context (2026-05-04, claude-teams-bot project):
+ * Three back-to-back PR reviews observed `/security-review` (and the gstack
+ * /review and /cso skills) rendering against the WRONG branch's diff.
+ * Root cause: review skills used `git diff origin/` (or `origin/HEAD...`,
+ * or `..HEAD`), all of which depend on the local working tree's HEAD.
+ * When a nested subagent ran `git checkout` (e.g., to inspect a sibling branch's
+ * file) and forgot to switch back, every subsequent `git diff` silently
+ * re-rendered against the new branch, and the review reported findings on
+ * unrelated code.
+ *
+ * The fix (review/SKILL.md.tmpl Step 0.5, cso/SKILL.md.tmpl Step 0.5):
+ * Pin BASE_SHA and HEAD_SHA at the start of the skill via `git rev-parse` /
+ * `gh pr view`, then use those SHAs in every subsequent diff/log/show
+ * command. SHAs are immutable across worktree flips.
+ *
+ * This test reproduces the failure mode end-to-end in a real git repo:
+ * 1. Build a repo with two divergent feature branches A and B
+ * 2. Check out branch B (the "PR branch we're reviewing")
+ * 3. Pin BASE_SHA and HEAD_SHA via the same logic Step 0.5 uses
+ * 4. Flip the worktree to branch A (simulating a subagent's stray checkout)
+ * 5. Verify three things:
+ * a) Bare `git diff main` returns A's diff (the bug — wrong branch)
+ * b) `git diff "$BASE_SHA" "$HEAD_SHA"` returns B's diff (the fix — correct)
+ * c) The two diffs are NOT equal (proves the failure mode is real, not
+ * a degenerate case)
+ *
+ * If this test starts failing, it means either (a) someone re-introduced the
+ * bare-ref pattern in a review skill, or (b) git's behavior around symbolic
+ * vs. SHA refs changed. Both are worth investigating before merging.
+ *
+ * Free tier. ~500ms runtime (mostly git subprocess overhead).
+ */
+
+import { describe, test, expect, afterAll } from 'bun:test';
+import { mkdtempSync, writeFileSync, rmSync, readFileSync, existsSync } from 'fs';
+import { join } from 'path';
+import { tmpdir } from 'os';
+import { spawnSync } from 'child_process';
+
+const ROOT = join(import.meta.dir, '..');
+const dirs: string[] = [];
+
+interface RepoFixture {
+ dir: string;
+ baseSha: string; // main branch tip
+ branchASha: string;
+ branchBSha: string;
+}
+
+/**
+ * Build a tiny fixture repo:
+ * main: README.md
+ * feature-A (off main): adds a.txt
+ * feature-B (off main): adds b.txt
+ * Both branches diverge from the same base. The worktree is left checked
+ * out on feature-B (the "PR branch we're reviewing").
+ */
+function buildFixture(): RepoFixture {
+ const dir = mkdtempSync(join(tmpdir(), 'pr-diff-pin-'));
+ dirs.push(dir);
+
+ const run = (cmd: string, args: string[]) => {
+ const r = spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 10000 });
+ if (r.status !== 0 && cmd === 'git') {
+ // Surface git failures so the test fails with a useful message instead
+ // of cryptic empty SHAs downstream.
+ const stderr = r.stderr?.toString() ?? '';
+ throw new Error(`git ${args.join(' ')} failed (exit ${r.status}): ${stderr}`);
+ }
+ return r;
+ };
+ const capture = (cmd: string, args: string[]): string => {
+ const r = spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 10000 });
+ return r.stdout.toString().trim();
+ };
+
+ run('git', ['init', '-b', 'main']);
+ run('git', ['config', 'user.email', 'test@test.com']);
+ run('git', ['config', 'user.name', 'Test']);
+ run('git', ['config', 'commit.gpgsign', 'false']);
+ run('git', ['config', 'core.autocrlf', 'false']);
+
+ // Base commit on main.
+ writeFileSync(join(dir, 'README.md'), 'base\n');
+ run('git', ['add', '.']);
+ run('git', ['commit', '-m', 'initial']);
+ const baseSha = capture('git', ['rev-parse', 'HEAD']);
+
+ // feature-A off main.
+ run('git', ['checkout', '-b', 'feature-A']);
+ writeFileSync(join(dir, 'a.txt'), 'A change\n');
+ run('git', ['add', '.']);
+ run('git', ['commit', '-m', 'feature A']);
+ const branchASha = capture('git', ['rev-parse', 'HEAD']);
+
+ // feature-B off main (back to main, then branch).
+ run('git', ['checkout', 'main']);
+ run('git', ['checkout', '-b', 'feature-B']);
+ writeFileSync(join(dir, 'b.txt'), 'B change\n');
+ run('git', ['add', '.']);
+ run('git', ['commit', '-m', 'feature B']);
+ const branchBSha = capture('git', ['rev-parse', 'HEAD']);
+
+ // Leave the worktree on feature-B — this is "the branch the user
+ // intended to review" before any subagent stomp.
+ return { dir, baseSha, branchASha, branchBSha };
+}
+
+function gitDiffOutput(dir: string, ...args: string[]): string {
+ const r = spawnSync('git', ['diff', ...args], {
+ cwd: dir, stdio: 'pipe', timeout: 10000,
+ });
+ return r.stdout.toString();
+}
+
+afterAll(() => {
+ for (const d of dirs) {
+ try { rmSync(d, { recursive: true, force: true }); } catch { /* best effort */ }
+ }
+});
+
+describe('pr-diff-pin regression — shared-checkout-branch-flip-during-review', () => {
+ test('the working-tree-flip failure mode is real (sanity)', () => {
+ // Establishes that the bug ISN'T already fixed by some unrelated change in git.
+ // If this assertion ever stops holding, the rest of this test file is moot
+ // and the named failure mode no longer exists.
+ const { dir } = buildFixture();
+
+ // We're on feature-B. `git diff main` shows B's changes.
+ const diffOnB = gitDiffOutput(dir, 'main');
+ expect(diffOnB).toContain('b.txt');
+ expect(diffOnB).not.toContain('a.txt');
+
+ // Subagent flips us to feature-A.
+ spawnSync('git', ['checkout', 'feature-A'], { cwd: dir, stdio: 'pipe' });
+
+ // Same `git diff main` invocation — now silently re-renders against A.
+ const diffAfterFlip = gitDiffOutput(dir, 'main');
+ expect(diffAfterFlip).toContain('a.txt');
+ expect(diffAfterFlip).not.toContain('b.txt');
+
+ // Same command, two different answers. That IS the bug.
+ expect(diffOnB).not.toEqual(diffAfterFlip);
+ });
+
+ test('SHA-pinning produces stable diff across worktree flips', () => {
+ const { dir, baseSha, branchBSha } = buildFixture();
+
+ // Pin SHAs while we're on feature-B (this is what review/SKILL.md.tmpl
+ // Step 0.5 does — `git rev-parse origin/` and `gh pr view --json
+ // headRefOid`, both immutable refs).
+ const pinnedDiffOnB = gitDiffOutput(dir, baseSha, branchBSha);
+
+ // Subagent stomps to feature-A.
+ spawnSync('git', ['checkout', 'feature-A'], { cwd: dir, stdio: 'pipe' });
+ expect(spawnSync('git', ['rev-parse', '--abbrev-ref', 'HEAD'], {
+ cwd: dir, stdio: 'pipe',
+ }).stdout.toString().trim()).toBe('feature-A');
+
+ // Re-run the SHA-pinned diff. Should be byte-identical.
+ const pinnedDiffAfterFlip = gitDiffOutput(dir, baseSha, branchBSha);
+ expect(pinnedDiffAfterFlip).toEqual(pinnedDiffOnB);
+ expect(pinnedDiffAfterFlip).toContain('b.txt');
+ expect(pinnedDiffAfterFlip).not.toContain('a.txt');
+ });
+
+ test('git log with pinned range is also stable', () => {
+ const { dir, baseSha, branchBSha } = buildFixture();
+
+ const log = (...args: string[]) =>
+ spawnSync('git', ['log', ...args], {
+ cwd: dir, stdio: 'pipe', timeout: 10000,
+ }).stdout.toString();
+
+ const before = log(`${baseSha}..${branchBSha}`, '--oneline');
+ expect(before).toContain('feature B');
+ expect(before).not.toContain('feature A');
+
+ spawnSync('git', ['checkout', 'feature-A'], { cwd: dir, stdio: 'pipe' });
+
+ const after = log(`${baseSha}..${branchBSha}`, '--oneline');
+ expect(after).toEqual(before);
+ });
+
+ test('git show with pinned SHA is also stable', () => {
+ const { dir, baseSha, branchBSha } = buildFixture();
+
+ // Each branch tip writes its own files. `git show :` only
+ // succeeds for the path that exists in that commit.
+ const showB = spawnSync('git', ['show', `${branchBSha}:b.txt`], {
+ cwd: dir, stdio: 'pipe', timeout: 10000,
+ });
+ expect(showB.status).toBe(0);
+ expect(showB.stdout.toString()).toContain('B change');
+
+ spawnSync('git', ['checkout', 'feature-A'], { cwd: dir, stdio: 'pipe' });
+
+ const showBAfterFlip = spawnSync('git', ['show', `${branchBSha}:b.txt`], {
+ cwd: dir, stdio: 'pipe', timeout: 10000,
+ });
+ expect(showBAfterFlip.status).toBe(0);
+ expect(showBAfterFlip.stdout.toString()).toEqual(showB.stdout.toString());
+ });
+
+ // ─── Template smell-tests ─────────────────────────────────────────────────
+ //
+ // Catch regressions where someone re-introduces a bare-ref pattern into
+ // the /review or /cso skill templates. The PR_DIFF_PIN preamble is
+ // load-bearing — if a template starts using `git diff origin/`
+ // again instead of `git diff "$BASE_SHA" "$HEAD_SHA"`, this test fails.
+
+ /**
+ * Pull out only **imperative** uses of `git diff` / `git log` / `git show`
+ * — i.e., commands the agent will actually run. We collect both:
+ * (a) lines inside fenced bash blocks (```bash … ```), and
+ * (b) inline backtick-quoted commands in narrative prose
+ * (e.g. `Run \`git diff origin/\` to get the full diff.`),
+ * which are also imperative — the agent reads narrative prose and runs
+ * the backtick-wrapped command verbatim. Codex's review caught a real
+ * gap here: we'd flagged Step 3's bash blocks but the inline Step-1
+ * directive was previously bare-ref.
+ *
+ * We deliberately exclude markdown table rows (don't/do comparison
+ * tables that document the bad patterns) and explicit "**Don't**" /
+ * "**Do**:" labels.
+ */
+ function imperativeBashCommands(content: string): string[] {
+ const lines = content.split('\n');
+ const commands: string[] = [];
+ let inBash = false;
+ for (let i = 0; i < lines.length; i++) {
+ const line = lines[i];
+ const trimmed = line.trim();
+ if (/^```(\w+)?$/.test(trimmed)) {
+ if (inBash) {
+ inBash = false;
+ } else if (trimmed === '```bash' || trimmed === '```sh') {
+ inBash = true;
+ }
+ continue;
+ }
+ if (inBash) {
+ if (trimmed.startsWith('#')) continue; // bash comment
+ if (trimmed === '') continue;
+ commands.push(line);
+ continue;
+ }
+ // Outside a fenced block — pull out inline backtick-quoted commands
+ // that look imperative (start with git/gh/bun, not narrative quotes
+ // about a pattern).
+ const inlineMatches = line.match(/`([^`]+)`/g) ?? [];
+ for (const m of inlineMatches) {
+ const cmd = m.slice(1, -1).trim();
+ // Skip variable refs like `$BASE_SHA`, type names, etc.
+ if (!/^(git|gh|bun)\s/.test(cmd)) continue;
+ // Skip "table-row-like" lines (markdown table cells).
+ if (trimmed.startsWith('|')) continue;
+ // Skip lines that explicitly label the bad pattern (Don't / wrong / bug).
+ if (/\*\*Don'?t\*\*|\bbad pattern\b|\bworking-tree dependent — bug\b/i.test(line)) continue;
+ // Skip lines that QUOTE a bad pattern alongside its replacement — these are
+ // inline "don't X, do Y" sentences, not imperatives.
+ if (
+ /`(git diff origin\/|\.\.HEAD|origin\/HEAD\.\.\.)/.test(line) &&
+ /\$BASE_SHA|\$HEAD_SHA/.test(line) &&
+ // and the BAD pattern is what we're currently looking at
+ /^(git diff origin\/|git log .*\.\.HEAD|git diff --name-only origin\/HEAD\.\.\.|git diff \.\.HEAD|git diff origin\/\.\.\.HEAD)/.test(cmd)
+ ) {
+ continue;
+ }
+ commands.push(`(inline) ${cmd}`);
+ }
+ }
+ return commands;
+ }
+
+ test('review/SKILL.md.tmpl uses pinned SHAs, not bare refs', () => {
+ const tmpl = readFileSync(join(ROOT, 'review', 'SKILL.md.tmpl'), 'utf-8');
+ const bashCommands = imperativeBashCommands(tmpl).join('\n');
+
+ // Must include the resolver invocation.
+ expect(tmpl).toContain('{{PR_DIFF_PIN}}');
+
+ // Imperative bash commands should reference the pinned SHAs.
+ expect(bashCommands).toContain('$BASE_SHA');
+ expect(bashCommands).toContain('$HEAD_SHA');
+
+ // Imperative bash commands must NOT use the working-tree-dependent forms.
+ // The narrative prose can mention them (and does — to explain why we don't
+ // use them); only fenced bash blocks are checked here.
+ for (const bad of [
+ /\bgit\s+diff\s+origin\/(?!\.\.\.\$HEAD_SHA|\s*--name-only\s+"\$BASE_SHA")/,
+ /\bgit\s+log\s+[^"`\n]*\.\.HEAD\b/,
+ /\bgit\s+diff\s+--name-only\s+origin\/HEAD\.\.\./,
+ ]) {
+ expect(bashCommands).not.toMatch(bad);
+ }
+ });
+
+ test('cso/SKILL.md.tmpl uses pinned SHAs in diff mode', () => {
+ const tmpl = readFileSync(join(ROOT, 'cso', 'SKILL.md.tmpl'), 'utf-8');
+
+ expect(tmpl).toContain('{{PR_DIFF_PIN}}');
+
+ // The --diff mode line is an inline backtick-quoted command in narrative
+ // prose, not a fenced bash block — assert it directly on the raw template.
+ expect(tmpl).toContain('"$BASE_SHA..$HEAD_SHA"');
+
+ // And it must mention the named failure mode somewhere.
+ expect(tmpl).toContain('shared-checkout-branch-flip-during-review');
+ });
+
+ test('the PR_DIFF_PIN resolver is registered (sanity)', () => {
+ // If someone removes the resolver, gen-skill-docs would silently emit the
+ // literal `{{PR_DIFF_PIN}}` placeholder into SKILL.md, breaking the skill.
+ const indexTs = readFileSync(
+ join(ROOT, 'scripts', 'resolvers', 'index.ts'),
+ 'utf-8',
+ );
+ expect(indexTs).toContain('PR_DIFF_PIN: generatePrDiffPin');
+ expect(indexTs).toContain('generatePrDiffPin');
+
+ // And the function exists in utility.ts.
+ const utilityTs = readFileSync(
+ join(ROOT, 'scripts', 'resolvers', 'utility.ts'),
+ 'utf-8',
+ );
+ expect(utilityTs).toContain('export function generatePrDiffPin');
+ });
+
+ test('generated SKILL.md files contain the Step 0.5 block', () => {
+ // Catches the case where someone edits .tmpl but forgets to run
+ // `bun run gen:skill-docs` before committing. The CI freshness check
+ // (gen-skill-docs --dry-run) catches this too, but this test makes the
+ // dependency explicit for the regression-test reader.
+ for (const skill of ['review', 'cso']) {
+ const mdPath = join(ROOT, skill, 'SKILL.md');
+ if (!existsSync(mdPath)) continue;
+ const md = readFileSync(mdPath, 'utf-8');
+ expect(md).toContain(
+ 'Pin diff context to immutable SHAs',
+ );
+ expect(md).toContain('shared-checkout-branch-flip-during-review');
+ }
+ });
+});