Skip to content

Commit 3fa1eab

Browse files
committed
feat: land structured-output migration and phase validation
1 parent b7ccb8f commit 3fa1eab

46 files changed

Lines changed: 2865 additions & 458 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [3.0.4] — 2026-03-11
11+
12+
### Added
13+
- `run-bug-hunter.cjs phase` command for schema-validated Skeptic, Referee, and Fixer phase execution with retry support
14+
- runner tests for invalid Skeptic, Referee, and Fixer artifacts plus Markdown companion rendering
15+
16+
### Changed
17+
- preflight now checks all shipped structured-output schemas, not just findings
18+
- structured-output migration now enforces orchestrated outbound validation beyond the local/manual path
19+
20+
## [3.0.3] — 2026-03-11
21+
22+
### Added
23+
- `scripts/render-report.cjs` Markdown renderer for final report and coverage summaries from canonical JSON artifacts
24+
- `scripts/tests/render-report.test.cjs` coverage for report and coverage rendering
25+
- `coverage.json` / `coverage.md` output path in `run-bug-hunter.cjs`
26+
27+
### Changed
28+
- Hunter, Skeptic, Referee, and Fixer prompts now describe JSON-first canonical artifacts
29+
- loop, fix-loop, local-sequential, and major mode docs now point at `*.json` phase artifacts and `coverage.json`
30+
- README, SKILL docs, evals, and the subagent wrapper now describe rendered Markdown as a companion to canonical JSON
31+
- local/manual mode docs now validate findings, skeptic, and referee artifacts with `schema-validate.cjs`
32+
33+
## [3.0.2] — 2026-03-11
34+
35+
### Added
36+
- `schemas/*.schema.json` versioned contracts for recon, findings, skeptic, referee, coverage, fix-report, plus shared definitions and example findings fixtures
37+
- `scripts/schema-runtime.cjs` lightweight schema runtime and `scripts/schema-validate.cjs` CLI for local artifact checks
38+
39+
### Changed
40+
- `payload-guard.cjs` now emits real schema refs instead of placeholder format/version objects
41+
- `bug-hunter-state.cjs` now rejects malformed findings and stores canonical `confidenceScore`, `category`, `evidence`, `runtimeTrigger`, and `crossReferences`
42+
- `run-bug-hunter.cjs` now treats missing or invalid `findings.json` as a retriable chunk failure and checks schema assets during preflight
43+
- script tests now cover schema validation, malformed findings rejection, and retry-after-schema-failure
44+
45+
## [3.0.1] — 2026-03-11
46+
47+
### Changed
48+
- Loop and fix-loop completion now require full queued source-file coverage, not just CRITICAL/HIGH coverage
49+
- Autonomous runs now continue through remaining MEDIUM and LOW files after prioritized chunks finish unless the user interrupts
50+
- Loop iteration guidance now scales `maxIterations` from queue size so large audits do not stop early
51+
- Large-codebase mode now treats LOW domains as part of the default autonomous queue instead of optional skipped work
52+
1053
## [3.0.0] — 2026-03-10
1154

1255
### Added
@@ -136,7 +179,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
136179
- Coverage enforcement — partial audits produce explicit warnings
137180
- Large codebase strategy with domain-first tiered scanning
138181

139-
[Unreleased]: https://github.com/codexstar69/bug-hunter/compare/v3.0.0...HEAD
182+
[Unreleased]: https://github.com/codexstar69/bug-hunter/compare/v3.0.4...HEAD
183+
[3.0.4]: https://github.com/codexstar69/bug-hunter/compare/v3.0.3...v3.0.4
184+
[3.0.3]: https://github.com/codexstar69/bug-hunter/compare/v3.0.2...v3.0.3
185+
[3.0.2]: https://github.com/codexstar69/bug-hunter/compare/v3.0.1...v3.0.2
186+
[3.0.1]: https://github.com/codexstar69/bug-hunter/compare/v3.0.0...v3.0.1
140187
[3.0.0]: https://github.com/codexstar69/bug-hunter/compare/v2.4.1...v3.0.0
141188
[2.4.1]: https://github.com/codexstar69/bug-hunter/compare/v2.4.0...v2.4.1
142189
[2.4.0]: https://github.com/codexstar69/bug-hunter/compare/v2.3.0...v2.4.0

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ Bug Hunter automatically selects the optimal scanning strategy based on your cod
280280
| **120–180 files** | Scaled | State-driven chunks with resume capability |
281281
| **180+ files** | Large-codebase | Domain-scoped pipelines + boundary audits (loop mode, on by default) |
282282

283-
Loop mode is **on by default** — the pipeline runs iteratively until every critical and high-risk file has been audited, with persistent state enabling stop-and-resume workflows. Use `--no-loop` for a single-pass scan.
283+
Loop mode is **on by default** — the pipeline runs iteratively until every queued scannable source file has been audited and, in fix mode, every discovered fixable bug has been processed. The agent should keep descending through CRITICAL → HIGH → MEDIUM → LOW automatically unless the user interrupts. Use `--no-loop` for a single-pass scan.
284284

285285
---
286286

@@ -523,12 +523,16 @@ Every run creates a `.bug-hunter/` directory (add to `.gitignore`) containing:
523523
|------|-----------|----------|
524524
| `report.md` | Always | Human-readable report: confirmed bugs, dismissed findings, coverage stats |
525525
| `findings.json` | Always | Machine-readable JSON for CI/CD and dashboards |
526+
| `skeptic.json` | When findings exist | Canonical Skeptic challenge artifact |
527+
| `referee.json` | When findings exist | Canonical Referee verdict artifact |
528+
| `coverage.json` | Loop/autonomous runs | Canonical coverage and loop state |
526529
| `triage.json` | Always | File classification, risk map, strategy selection, token estimates |
527530
| `recon.md` | Always | Tech stack analysis, attack surface mapping, scan order |
528-
| `findings.md` | Always | Raw Hunter findings before Skeptic review |
529-
| `skeptic.md` | Always | Skeptic challenge decisions with evidence |
530-
| `referee.md` | Always | Referee final verdicts with enrichment |
531-
| `fix-report.md` | Fix mode | Per-bug fix status, verification results, git diff summary |
531+
| `findings.md` | Optional | Markdown companion rendered from `findings.json` |
532+
| `skeptic.md` | Optional | Markdown companion rendered from `skeptic.json` |
533+
| `referee.md` | Optional | Markdown companion rendered from `referee.json` |
534+
| `coverage.md` | Loop/autonomous runs | Markdown companion rendered from `coverage.json` |
535+
| `fix-report.md` | Fix mode | Markdown companion for fix results |
532536
| `fix-report.json` | Fix mode | Machine-readable fix results for CI/CD gating and dashboards |
533537
| `worktree-*/` | Worktree fix mode | Temporary isolated worktrees for Fixer subagents (auto-cleaned) |
534538
| `threat-model.md` | `--threat-model` | STRIDE threat model with trust boundaries and data flows |
@@ -560,7 +564,7 @@ The pipeline adapts to whatever it finds. Triage classifies files by extension a
560564
| `--fix` | Find and auto-fix bugs (default behavior) |
561565
| `--approve` | Interactive mode — ask before each fix |
562566
| `--autonomous` | Full auto-fix with zero intervention |
563-
| `--loop` | Iterative mode — runs until 100% critical file coverage **(on by default)** |
567+
| `--loop` | Iterative mode — runs until 100% queued source-file coverage **(on by default)** |
564568
| `--no-loop` | Disable loop mode — single-pass scan only |
565569
| `--deps` | Include dependency CVE scanning with reachability analysis |
566570
| `--threat-model` | Generate or use STRIDE threat model for targeted security analysis |

SKILL.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ For large scans: process chunks sequentially with persistent state to avoid comp
5050
/bug-hunter --autonomous src/ # Alias for no-intervention auto-fix run
5151
/bug-hunter --fix -b feature-xyz # Find + fix on branch diff
5252
/bug-hunter --fix --approve src/ # Find + fix, but ask before each fix
53-
/bug-hunter src/ # Loops by default: audit until 100% coverage
53+
/bug-hunter src/ # Loops by default: audit + fix until all queued source files are covered
5454
/bug-hunter --no-loop src/ # Single-pass only, no iterating
5555
/bug-hunter --no-loop --scan-only src/ # Single-pass scan, no fixes, no loop
5656
/bug-hunter --deps src/ # Include dependency CVE scan
@@ -130,7 +130,7 @@ If triage was not run (e.g., Recon was called directly without the orchestrator)
130130

131131
**File partitioning rules (Extended/Scaled modes):**
132132
- **Service-aware partitioning (preferred)**: If Recon detected multiple service boundaries (monorepo), partition by service.
133-
- **Risk-tier partitioning (fallback)**: process CRITICAL then HIGH then MEDIUM.
133+
- **Risk-tier partitioning (fallback)**: process CRITICAL then HIGH then MEDIUM then LOW.
134134
- Keep chunk size small (recommended 20-40 files) to avoid context compaction issues.
135135
- Persist chunk progress in `.bug-hunter/state.json` so restarts do not re-scan done chunks.
136136
- Test files (CONTEXT-ONLY) are included only when needed for intent.
@@ -296,7 +296,7 @@ Token estimate: ~[N] tokens for full pipeline
296296
```
297297
⚠️ This codebase has [N] source files (FILE_BUDGET: [B]).
298298
Single-pass mode will only cover a subset. Remove `--no-loop` to enable iterative coverage.
299-
Proceeding with partial scan — CRITICAL and HIGH domains only.
299+
Proceeding with partial scan — highest-priority queued files only.
300300
```
301301

302302
**Triage replaces Recon's FILE_BUDGET computation.** Recon still runs for tech stack identification and pattern-based analysis, but it no longer needs to count files or compute the context budget — triage already did that, for free.
@@ -362,8 +362,8 @@ read({ path: "$SKILL_DIR/prompts/hunter.md" })
362362
# - Apply the security checklist sweep
363363
# - Write each finding in BUG-N format
364364
365-
# 3. Write your findings to disk:
366-
write({ path: ".bug-hunter/findings.md", content: "<your findings>" })
365+
# 3. Write your canonical findings artifact to disk:
366+
write({ path: ".bug-hunter/findings.json", content: "<your findings json>" })
367367
```
368368

369369
#### Example B: subagent dispatch
@@ -383,16 +383,16 @@ read({ path: "$SKILL_DIR/templates/subagent-wrapper.md" })
383383
# - {RISK_MAP} = <risk map from .bug-hunter/recon.md>
384384
# - {TECH_STACK} = <framework, auth, DB from Recon>
385385
# - {PHASE_SPECIFIC_CONTEXT} = <doc-lookup instructions from doc-lookup.md>
386-
# - {OUTPUT_FILE_PATH} = ".bug-hunter/findings.md"
386+
# - {OUTPUT_FILE_PATH} = ".bug-hunter/findings.json"
387387
# - {SKILL_DIR} = <absolute path>
388388
# 4. Dispatch:
389389
subagent({
390390
agent: "hunter-agent",
391391
task: "<the filled template>",
392-
output: ".bug-hunter/findings.md"
392+
output: ".bug-hunter/findings.json"
393393
})
394394
# 5. Read the output:
395-
read({ path: ".bug-hunter/findings.md" })
395+
read({ path: ".bug-hunter/findings.json" })
396396
```
397397

398398
When launching subagents, always pass `SKILL_DIR` explicitly in the task context so prompt commands like `node "$SKILL_DIR/scripts/doc-lookup.cjs"` resolve correctly. The `context7-api.cjs` script is kept as a fallback if `doc-lookup.cjs` fails.
@@ -491,30 +491,36 @@ In a collapsed `<details>` section (for transparency).
491491
- Skeptic accuracy: X/Y correct challenges (Z%)
492492

493493
### 7. Coverage assessment
494-
- If ALL CRITICAL/HIGH files scanned: "Full coverage achieved."
494+
- If ALL queued scannable source files scanned: "Full queued coverage achieved."
495495
- If any missed: list them with note about `--loop` mode.
496496

497497
### 7b. Coverage enforcement (mandatory)
498498

499-
If the coverage assessment shows ANY CRITICAL or HIGH files were not scanned, the pipeline is NOT complete:
499+
If the coverage assessment shows ANY queued scannable source files were not scanned, the pipeline is NOT complete:
500500

501-
1. If `LOOP_MODE=true` (default): the ralph-loop will automatically continue to the next iteration covering missed files. Call `ralph_done` to proceed to the next iteration. Do NOT output `<promise>COMPLETE</promise>` until all CRITICAL/HIGH files show DONE.
501+
1. If `LOOP_MODE=true` (default): the ralph-loop will automatically continue to the next iteration covering missed files. Call `ralph_done` to proceed to the next iteration. Do NOT output `<promise>COMPLETE</promise>` until all queued scannable source files show DONE.
502502

503503
2. If `LOOP_MODE=false` (`--no-loop` was specified) AND missed files exist:
504504
- If total files ≤ FILE_BUDGET × 3: Output the report with a WARNING:
505505
```
506-
⚠️ PARTIAL COVERAGE: [N] CRITICAL/HIGH files were not scanned.
506+
⚠️ PARTIAL COVERAGE: [N] queued source files were not scanned.
507507
Run `/bug-hunter [path]` for complete coverage (loop is on by default).
508508
Unscanned files: [list them]
509509
```
510510
- If total files > FILE_BUDGET × 3: The report MUST include:
511511
```
512512
🚨 LARGE CODEBASE: [N] source files (FILE_BUDGET: [B]).
513-
Single-pass audit covered [X]% of CRITICAL/HIGH files.
513+
Single-pass audit covered [X]% of queued source files.
514514
Use `/bug-hunter [path]` for full coverage (loop is on by default).
515515
```
516516
517-
3. Do NOT claim "audit complete" or "full coverage achieved" unless ALL CRITICAL and HIGH files have status DONE. A partial audit is still valuable — report what you found honestly.
517+
3. Do NOT claim "audit complete" or "full coverage achieved" unless ALL queued scannable source files have status DONE. A partial audit is still valuable — report what you found honestly.
518+
519+
4. Autonomous runs must keep descending through the remaining priority queue after the current prioritized chunk is done:
520+
- Finish current CRITICAL/HIGH work first.
521+
- Immediately continue with remaining MEDIUM files.
522+
- Then continue with remaining LOW files.
523+
- Only stop when the queue is exhausted, the user interrupts, or a hard blocker prevents safe progress.
518524
519525
If zero bugs were confirmed, say so clearly — a clean report is a good result.
520526
@@ -577,7 +583,12 @@ Rules for JSON output:
577583
- `dependencies` array: populated only if `--deps` was used and `.bug-hunter/dep-findings.json` exists.
578584
- This JSON enables CI/CD gating, dashboard ingestion, and downstream patch generation.
579585

580-
Also write the final markdown report to `.bug-hunter/report.md` as the canonical human-readable output (in addition to displaying it to the user).
586+
Also write the final markdown report to `.bug-hunter/report.md` as the
587+
canonical human-readable output. Generate it from the JSON artifacts with:
588+
589+
```bash
590+
node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md"
591+
```
581592

582593
---
583594

0 commit comments

Comments
 (0)