v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills#1442
Merged
Conversation
Split the query on whitespace into tokens; a learning matches if ANY token appears as a substring in ANY of key/insight/files. Previously the whole query was a single substring, so multi-word queries like "debug investigation" only matched learnings whose insight contained that exact contiguous phrase, which is usually nothing. Whitespace-only query falls through to no-query (matches today's no-flag behavior). Single-word queries behave exactly as before. Adds test/gstack-learnings-search.test.ts: 3 assertions covering multi-token, single-token, and no-query backwards compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uard
The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that
gets interpolated as --query "<keyword>" into the generated bash. Empty
value falls through to no-query (principle of least surprise: a stray
{{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a
build failure). Pattern reuses the parameterized-macro parsing from
composition.ts. The 13 templates that don't pass a query stay
byte-identical in their generated SKILL.md output.
Shell-injection guard: the query value is whitelisted to
^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks,
semicolons, or quotes throw a loud build error instead of emitting
executable bash. Static template queries are safe by inspection;
this defends against future contributors writing dangerous values.
Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args,
claude+query=foo bar on both cross-project and project-scoped branches,
codex host variant, empty value semantics, and shell-injection payloads
(\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…/qa /ship
The three long skills now pull learnings keyed to their theme at the
top, then re-pull at phase boundaries as work shifts to new sub-tasks.
Top-of-skill queries (5-6 token unions, token-OR matched):
- investigate: "debug investigation root cause hypothesis bug fix"
- qa: "qa testing bug regression flake fixture"
- ship: "release ship version changelog merge pr"
Mid-flow refresh blocks (concrete keyword recipe + worked examples):
- investigate: between Phase 1 (hypothesis) and Phase 2 (analysis),
keyed to the hypothesis noun. Examples: auth-cookie, session-expiry.
- qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the
buggy component name. Examples: checkout-button, signup-form.
- ship: just before Step 12 (VERSION bump), keyed to the headline
feature. Examples: learnings-search, pacing, worktree-ship.
Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes,
dots, colons) so dynamic queries cannot inject shell metacharacters.
The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form.
Backwards-compat verified via diff: their generated SKILL.md output is
byte-identical to before this change.
Golden ship fixtures regenerated to match the new ship/SKILL.md output.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up to 513c966 — the codex and factory host outputs needed regeneration too, missed in the initial commit because gen:skill-docs was only run for the claude host. Now matches gen:skill-docs --host all. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
E2E Evals: ✅ PASS13/13 tests passed | $2.13 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
Willardgmoore
added a commit
to Willardgmoore/gstack
that referenced
this pull request
May 12, 2026
Brings in: - v1.33.2.0: setup guard against Conductor worktree pollution (garrytan#1446) - v1.33.1.0: learnings token-OR query + task-shaped retrieval (garrytan#1442) - v1.33.0.0: /sync-gbrain memory stage batch-import refactor (garrytan#1432) VERSION stays at 1.34.0.0 (no collision — queue-aware check confirms). CHANGELOG: our [1.34.0.0] entry at top, three new main entries below. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For the last 30+ versions, every gstack skill loaded learnings the same way —
gstack-learnings-search --limit 10at the top of the skill, generic top-10, no query, no refresh. Short skills are fine; long skills (/investigate,/qa,/ship) drift away from whatever was loaded at minute zero. This PR fixes that for the three long skills, plus a root-cause fix to the binary's--querysemantics that was silently broken.Headline changes:
bin/gstack-learnings-search—--querynow uses token-OR matching. Multi-word queries split on whitespace; a learning matches if ANY token appears in ANY of key/insight/files. The old whole-string substring behavior was a silent footgun —--query "debug investigation"returned nothing on real-world stores.{{LEARNINGS_SEARCH:query=KEYWORD}}macro — parameterized resolver with shell-injection guard. Query value must match^[A-Za-z0-9 _-]+$at gen-skill-docs time, so$(), backticks, semicolons, quotes throw a build error./investigate,/qa,/ship— top-of-skill pull keyed to their theme (e.g. ship →release ship version changelog merge pr). Each gains a mid-flow refresh at the phase boundary (/investigatePhase 1→2,/qatriage→fix-loop,/shippre-VERSION) with a concrete keyword recipe + worked examples that enforce alphanumeric-only keywords to sidestep shell quoting.Other 13 skills are byte-identical in their generated SKILL.md output — backwards-compat verified via diff.
Numbers (real measurements on this project's learnings.jsonl, 35 entries)
"debug investigation root cause""qa testing bug regression""release ship version changelog""skill resolver"Without the binary fix, the rest of this PR would have shipped silent — the bash would run, the binary would exit 0 with no output, and the skill would render the same empty section it always rendered.
Pre-Landing Review
Eng Review (PLAN) ran on 2026-05-11: CLEAR, 2 issues found (D1 superseded by D4 root-cause binary fix, D2 covered by 5 test assertions), 0 critical gaps, 0 unresolved.
Codex Outside Voice (plan-time) caught the load-bearing substring-vs-token-OR semantics flaw that invalidated the original D1 reasoning. Adopted as D4 (root-cause binary fix).
Codex Adversarial (diff-time, this PR): caught a shell-injection vector in the resolver —
queryArgwas being interpolated into bash without escaping. Fixed by addingQUERY_SAFE_RE = /^[A-Za-z0-9 _-]+$/validation at gen-skill-docs time. New test assertion covers$(whoami), backticks,;,&,",\,$xpayloads. Recommendation upgraded from "block" to "ship".Test Coverage
test/gstack-learnings-search.test.ts(NEW): 3 assertions on token-OR semantics — multi-token, single-token, no-query backwards compat.test/gen-skill-docs.test.ts: 5 NEW assertions on the resolver — no-args, claude+query=foo bar (both bash branches), codex host variant, empty value falls through, shell-injection payloads throw.bun testsuite passes.Plan Completion
All 8 plan items DONE (verified against
/Users/garrytan/.claude/plans/system-instruction-you-are-working-immutable-sunrise.md):Scope Drift
Clean. Plan intent: token-shaped learnings retrieval for long skills + binary fix. Delivered: exactly that, plus one defense-in-depth shell-injection guard added during ship-time Codex review.
For contributors
Contributed by @Fergtic (chronicle-write-up) who flagged the load-once + no-refresh pattern and the spend-per-success data point that motivated this work.
Test plan
bun test test/gstack-learnings-search.test.ts— 3 passbun test test/gen-skill-docs.test.ts -t "LEARNINGS_SEARCH"— 14 pass (5 LEARNINGS_SEARCH + 9 related)bun test test/host-config.test.ts -t "golden"— 3 passbin/gstack-learnings-search --query "skill resolver" --limit 5returns 12 token-OR matchesgit diffafterbun run gen:skill-docs --host all)🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.