You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add experiment loop for metric-driven auto-research iterations
Adds experiment-loop.cjs with init/run/log/check-continue/status/stop
CLI commands, experiment.schema.json, comprehensive test suite (61 tests),
and updated loop.md with coverage-driven scan orchestration.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+26-1Lines changed: 26 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [3.0.9] - 2026-03-13
9
+
10
+
### Added
11
+
-`scripts/experiment-loop.cjs` — autonomous experiment loop engine inspired by pi-autoresearch. Provides metric-driven iteration with baseline + delta tracking, append-only JSONL persistence, segmented sessions, and full state reconstruction from log alone.
-`check-continue` command — single gateway that checks all loop conditions (stop file, iteration cap, consecutive crash breaker, resume cooldown) before each iteration
14
+
- Hard iteration cap (default: 10, configurable via `--max-iterations`) prevents runaway loops
15
+
- Consecutive crash breaker (3 in a row) auto-stops to prevent token waste
16
+
- Stop-file cancellation (`experiment-loop.cjs stop` or `touch .bug-hunter/experiment.stop`) for easy user interruption
17
+
- Auto-resume with 5-minute cooldown for graceful recovery after agent context limits
18
+
- Secondary metric consistency enforcement — locks metric names after first result in a segment
- 40 new tests covering all experiment-loop commands, guardrails, and edge cases (including negative metrics, zero/negative max-iterations, --duration-ms)
21
+
22
+
### Changed
23
+
-**Experiment tracking is now active by default** when `LOOP_MODE=true` — no `--experiment` flag needed
24
+
-`SKILL.md` now auto-initializes `experiment-loop.cjs` during loop setup (init + check-continue wiring)
25
+
-`modes/loop.md` updated with full experiment tracking integration, per-iteration workflow, and documentation of all stop mechanisms (user-initiated vs automatic)
26
+
-`scripts/schema-runtime.cjs` registers the new `experiment` schema
27
+
-`schemas/experiment.schema.json` cleaned: removed unused `command` and `passed` fields, added `maxIterations` field
28
+
-`scripts/experiment-loop.cjs``log` command now accepts `--duration-ms` flag to persist actual iteration duration (was hardcoded to 0)
29
+
-`llms.txt` and `llms-full.txt` updated with experiment loop capabilities
30
+
- Test suite expanded from 61 to **101 tests** (0 failures)
31
+
8
32
## [3.0.8] - 2026-03-13
9
33
10
34
### Highlights
@@ -239,7 +263,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
239
263
- Coverage enforcement - partial audits produce explicit warnings
240
264
- Large codebase strategy with domain-first tiered scanning
Copy file name to clipboardExpand all lines: SKILL.md
+19-1Lines changed: 19 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -471,10 +471,28 @@ Read the corresponding mode file using `STRATEGY` from the triage JSON:
471
471
472
472
**Backend override for local-sequential:** If `AGENT_BACKEND = "local-sequential"`, read `SKILL_DIR/modes/local-sequential.md` instead of the size-based mode file. The local-sequential mode handles all sizes internally with its own chunking logic.
473
473
474
-
If LOOP_MODE=true, also read:
474
+
If LOOP_MODE=true, also read (loop.md includes experiment tracking with iteration caps, stop-file safety, and auto-resume):
475
475
-`SKILL_DIR/modes/fix-loop.md` when FIX_MODE=true
476
476
-`SKILL_DIR/modes/loop.md` otherwise
477
477
478
+
**CRITICAL — experiment tracking initialization:** When `LOOP_MODE=true`, initialize experiment tracking BEFORE the first pipeline iteration by running:
If `continue` is false, stop the loop immediately. After each iteration, log the result with `log`. This is active by default — no `--experiment` flag needed.
495
+
478
496
**CRITICAL — ralph-loop integration:** When `LOOP_MODE=true`, you MUST call the `ralph_start` tool before running the first pipeline iteration. The loop mode files (`loop.md` / `fix-loop.md`) contain the exact `ralph_start` call to make, including the `taskContent` and `maxIterations` parameters. Without calling `ralph_start`, the loop will NOT iterate — it will run once and stop. After each iteration, call `ralph_done` to continue, or output `<promise>COMPLETE</promise>` when done.
Copy file name to clipboardExpand all lines: modes/loop.md
+177-1Lines changed: 177 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,6 +120,182 @@ Each iteration after the first:
120
120
121
121
- Max iterations should scale with the queue size so autonomous runs do not stop early
122
122
- Each iteration only scans NEW files — no re-scanning already-DONE files
123
-
- User can stop anytime with ESC or `/ralph-stop`
123
+
- User can stop anytime with ESC, `/ralph-stop`, or `experiment-loop.cjs stop`
124
124
- Canonical state is in `.bug-hunter/coverage.json`; `coverage.md` is derived
125
125
and fully resumable from that JSON
126
+
127
+
---
128
+
129
+
## Experiment Tracking (autoresearch integration)
130
+
131
+
When `LOOP_MODE=true`, each loop iteration is automatically tracked as an **experiment** using the append-only JSONL experiment log. This is active by default — no extra flags needed. It provides metric-driven optimization with baseline comparison, auto-resume, and user-interruptible stop files.
132
+
133
+
### Setup (first iteration only)
134
+
135
+
Before the first pipeline iteration, initialize the experiment session:
136
+
137
+
```bash
138
+
node scripts/experiment-loop.cjs init \
139
+
.bug-hunter/experiment.jsonl \
140
+
"bug-hunt-$(date +%Y%m%d)" \
141
+
bugs_confirmed \
142
+
higher \
143
+
count \
144
+
--max-iterations 10
145
+
```
146
+
147
+
The `--max-iterations` flag sets the **hard iteration cap** for the session (default: 10). The loop will automatically stop when this cap is reached — no runaway loops. Each subsequent `init` call starts a **new segment** with its own baseline and counter reset.
148
+
149
+
### Per-iteration workflow
150
+
151
+
Each iteration follows the **check-continue → run → log** pattern:
152
+
153
+
1.**Check continue** — the single gateway before every iteration:
154
+
```bash
155
+
node scripts/experiment-loop.cjs check-continue \
156
+
.bug-hunter/experiment.jsonl \
157
+
--stop-file .bug-hunter/experiment.stop
158
+
```
159
+
160
+
This checks ALL conditions in one call and returns a clear yes/no:
- Auto-commits on `keep` status (configurable via `--auto-commit false`)
198
+
- Computes delta from baseline (% improvement)
199
+
- Returns whether this is the new best result
200
+
201
+
4.**Check status** — see cumulative progress:
202
+
```bash
203
+
node scripts/experiment-loop.cjs status .bug-hunter/experiment.jsonl
204
+
```
205
+
206
+
### Stopping the loop
207
+
208
+
#### User-initiated stop (easy, immediate)
209
+
210
+
The user can cancel the loop at any time. These are all equivalent:
211
+
212
+
| Method | How | When it takes effect |
213
+
|--------|-----|---------------------|
214
+
|**ESC key**| Press ESC in the terminal | Immediate — kills current iteration |
215
+
|**Ctrl+C**| Terminal interrupt | Immediate |
216
+
|**`/ralph-stop`**| Type in the CLI | End of current iteration |
217
+
|**Stop file**|`node scripts/experiment-loop.cjs stop`| Before next iteration |
218
+
|**Touch file**|`touch .bug-hunter/experiment.stop`| Before next iteration |
219
+
220
+
The `check-continue` and `run` commands both check the stop file, so the loop will halt gracefully at the next natural checkpoint.
221
+
222
+
> **Interaction with ralph-loop:** ESC and Ctrl+C kill the process immediately (ralph-loop handles cleanup). The stop file is a softer mechanism — it lets the current operation finish, then halts before the next iteration. Both work independently. If a stale stop file is left behind from a previous run, `check-continue` will detect it and refuse to proceed — so always clean up.
223
+
224
+
To resume after a user stop:
225
+
226
+
```bash
227
+
node scripts/experiment-loop.cjs clear-stop
228
+
```
229
+
230
+
#### Automatic stop (system-initiated)
231
+
232
+
The system will automatically stop the loop when ANY of these conditions are met — no user action required:
233
+
234
+
| Condition | Default | Why |
235
+
|-----------|---------|-----|
236
+
|**Iteration cap reached**| 10 iterations | Prevents runaway loops. Configurable via `--max-iterations`. |
237
+
|**3 consecutive crashes**| 3 in a row | Something is broken — don't waste tokens. Fix and re-init. |
> **Note:**`can-resume` and `record-resume` are low-level primitives. In normal operation, always use `check-continue` as the primary gateway — it already includes the resume cooldown check along with all other conditions. Use `can-resume` only for diagnostic purposes.
258
+
259
+
This is distinct from user-initiated stop: the agent auto-resumes after context limits, but respects user stop files.
260
+
261
+
### Metrics tracked
262
+
263
+
| Metric | Type | Description |
264
+
|--------|------|-------------|
265
+
|`bugs_confirmed`| Primary | Number of bugs surviving the full adversarial pipeline |
266
+
|`false_positives`| Secondary | Findings killed by Skeptic + Referee |
267
+
|`files_scanned`| Secondary | Files processed this iteration |
268
+
|`fix_success_rate`| Secondary | % of fixes that passed verification |
269
+
270
+
Secondary metrics are **locked after the first result** in a segment. All subsequent results must provide the same set of secondary metrics (or use `--force true` to change them). This prevents inconsistent tracking.
271
+
272
+
### JSONL file format
273
+
274
+
The experiment log at `.bug-hunter/experiment.jsonl` is append-only. Each line is one of:
Copy file name to clipboardExpand all lines: package.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
{
2
2
"name": "@codexstar/bug-hunter",
3
-
"version": "3.0.8",
3
+
"version": "3.0.9",
4
4
"description": "Adversarial AI bug hunter — multi-agent pipeline finds security vulnerabilities, logic errors, and runtime bugs, then fixes them autonomously. Works with Claude Code, Cursor, Codex CLI, Copilot, Kiro, and more.",
0 commit comments