feat(overseer): replay harness v0 + CI gate + one-boss invariant stub (Step 2.75)#55
Open
heavygee wants to merge 1 commit into
Open
feat(overseer): replay harness v0 + CI gate + one-boss invariant stub (Step 2.75)#55heavygee wants to merge 1 commit into
heavygee wants to merge 1 commit into
Conversation
Step 2.75 of the Overseer build sequence (prioritization §6, ADR-001). - Captured-event-stream loader: parses a synthetic snapshot (sessions, events, event_links, baseline inbox items, dispatch envelopes, worker messages) and replays it into a sandbox :memory: Store - never touches the production DB. - Run-once promotion + prioritization entry point invokable against a snapshot (runPromotionPass). - 12 golden scenarios from the §6 table: routine-progress surfaces nothing, same-session collapse with merged source_event_ids, idempotent re-emission, blocked_by fan-in root-cause traversal, approval escalation, stale-item aging + EEMUA-191 KPIs (alarm-flood / stale / priority-distribution), completed+PR review item, completed-noise falls out, CI/worker contradiction surfaced-not-resolved, operator noise demotion. Plus the +1: hub-inferred stale silence is captured-only (locks fix/overseer-inbox-stale-noise), worker self-reported stalled promotes. - One-boss invariant (ADR-001): scans dispatched events, asserts the worker message is operator-attributed with no Overseer metadata or attribution boilerplate. Passes vacuously now (no dispatches); clean + leak fixtures prove the assertion shape activates at Step 4. - Dedicated CI gate (.github/workflows/overseer-replay.yml) path-filtered to Overseer logic / inbox scoring / event taxonomy / worker-emission contract; runs the harness on every matching PR. - Fixtures are synthetic (contracts §7), never real transcripts. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 2.75 of the Overseer build sequence (prioritization §6 replay harness, ADR-001 one-boss invariant). Stacks on the Overseer substrate (events #22 + inbox #23 +
fix/overseer-inbox-stale-noise); PR base is the substrate branch so this diff is only the harness.hub/src/overseer/replayHarness.ts): parses a synthetic snapshot (sessions, events, event_links, baseline inbox items, dispatch envelopes, worker messages) and replays it into a sandbox:memory:Store. Never touches the production DB.runPromotionPass) invokable against a snapshot.source_event_ids; idempotent re-emission;blocked_byfan-in root-cause traversal (surface root, not symptoms); approval escalation to top tier; stale-item aging + EEMUA-191 KPIs (alarm-flood / stale-count / priority-distribution); completed+PR review item; completed-noise falls out but stays queryable; CI/worker contradiction surfaced-not-resolved; operator noise demotion recorded as training label. Plus the +1: hub-inferred stale silence is captured-only (locksfix/overseer-inbox-stale-noise), while a worker self-reportedstalledpromotes.hub/src/overseer/oneBossInvariant.ts, ADR-001): scansdispatchedevents and asserts the worker message is operator-attributed — no Overseer metadata keys, no generated attribution boilerplate (intent-based, does not ban the word "overseer"). Passes vacuously now (no dispatches exist);one-boss-clean+one-boss-leakfixtures prove the assertion shape catches a real leak and will auto-activate when Step 4 dispatch lands..github/workflows/overseer-replay.yml): path-filtered to Overseer logic / inbox scoring / event taxonomy / worker-emission contract; runs the harness on every matching PR.test/fixtures/overseer-replay/): synthetic, never real transcripts (contracts §7).Test plan
cd hub && bun test src/overseer— 19 pass (12 golden + 3 one-boss + sandbox + 3 loader-validation)cd shared && bun test src/overseerEvents.test.ts src/overseerInbox.test.ts— 14 passcd hub && bun test src/store/inboxItems.test.ts src/sync/overseerEventRecorder*.test.ts— 36 passhub/src/overseer/*, fixtures, workflow) typechecks cleanNotes for orchestrator / soup-manager
bun typecheck/bun run testfrom another peer's in-flight work —hub/src/notifications/modelErrorCopy.tsmissing,ModelErrorNotificationunexported, andSession-type drift (serviceTier/seqrequired) breaking several*.test.tsfixtures +web/src/hooks/useSSE.ts. None are touched here. The dedicatedoverseer-replaygate runs the Overseer test surface (runtime) so it stays green independent of that drift.HAPI_SKIP_COMMIT_HOOKS=1) because the branch stacks on garden-bearing substrate; the garden→upstream/mainrebase for a clean upstream PR is the soup-manager's job per the handoff brief.overseer-replay-harness-worktrees/0621-a0a5(branchhapi-0621-a0a5), pushed to origin asfeat/overseer-replay-harness. The sibling empty worktreeworktrees/overseer-replay-harnesscan be dropped.Made with Cursor