Skip to content

feat(overseer): replay harness v0 + CI gate + one-boss invariant stub (Step 2.75)#55

Open
heavygee wants to merge 1 commit into
fix/overseer-inbox-stale-noisefrom
feat/overseer-replay-harness
Open

feat(overseer): replay harness v0 + CI gate + one-boss invariant stub (Step 2.75)#55
heavygee wants to merge 1 commit into
fix/overseer-inbox-stale-noisefrom
feat/overseer-replay-harness

Conversation

@heavygee

Copy link
Copy Markdown
Owner

Summary

Step 2.75 of the Overseer build sequence (prioritization §6 replay harness, ADR-001 one-boss invariant). Stacks on the Overseer substrate (events #22 + inbox #23 + fix/overseer-inbox-stale-noise); PR base is the substrate branch so this diff is only the harness.

  • Captured-event-stream loader (hub/src/overseer/replayHarness.ts): parses a synthetic snapshot (sessions, events, event_links, baseline inbox items, dispatch envelopes, worker messages) and replays it into a sandbox :memory: Store. Never touches the production DB.
  • Run-once promotion+prioritization entry point (runPromotionPass) invokable against a snapshot.
  • 12 golden scenarios from the §6 table (≥10 required): routine-progress surfaces nothing; same-session collapse with merged source_event_ids; idempotent re-emission; blocked_by fan-in root-cause traversal (surface root, not symptoms); approval escalation to top tier; stale-item aging + EEMUA-191 KPIs (alarm-flood / stale-count / priority-distribution); completed+PR review item; completed-noise falls out but stays queryable; CI/worker contradiction surfaced-not-resolved; operator noise demotion recorded as training label. Plus the +1: hub-inferred stale silence is captured-only (locks fix/overseer-inbox-stale-noise), while a worker self-reported stalled promotes.
  • One-boss invariant (hub/src/overseer/oneBossInvariant.ts, ADR-001): scans dispatched events and asserts the worker message is operator-attributed — no Overseer metadata keys, no generated attribution boilerplate (intent-based, does not ban the word "overseer"). Passes vacuously now (no dispatches exist); one-boss-clean + one-boss-leak fixtures prove the assertion shape catches a real leak and will auto-activate when Step 4 dispatch lands.
  • CI gate (.github/workflows/overseer-replay.yml): path-filtered to Overseer logic / inbox scoring / event taxonomy / worker-emission contract; runs the harness on every matching PR.
  • Fixtures (test/fixtures/overseer-replay/): synthetic, never real transcripts (contracts §7).

Test plan

  • cd hub && bun test src/overseer — 19 pass (12 golden + 3 one-boss + sandbox + 3 loader-validation)
  • cd shared && bun test src/overseerEvents.test.ts src/overseerInbox.test.ts — 14 pass
  • cd hub && bun test src/store/inboxItems.test.ts src/sync/overseerEventRecorder*.test.ts — 36 pass
  • New surface (hub/src/overseer/*, fixtures, workflow) typechecks clean

Notes for orchestrator / soup-manager

  • Pre-existing base-stack breakage (NOT this PR): the substrate already fails repo-wide bun typecheck/bun run test from another peer's in-flight work — hub/src/notifications/modelErrorCopy.ts missing, ModelErrorNotification unexported, and Session-type drift (serviceTier/seq required) breaking several *.test.ts fixtures + web/src/hooks/useSSE.ts. None are touched here. The dedicated overseer-replay gate runs the Overseer test surface (runtime) so it stays green independent of that drift.
  • Pushed to the operator fork with the documented garden-guard override (HAPI_SKIP_COMMIT_HOOKS=1) because the branch stacks on garden-bearing substrate; the garden→upstream/main rebase for a clean upstream PR is the soup-manager's job per the handoff brief.
  • Worktree note: this landed from the nested worktree overseer-replay-harness-worktrees/0621-a0a5 (branch hapi-0621-a0a5), pushed to origin as feat/overseer-replay-harness. The sibling empty worktree worktrees/overseer-replay-harness can be dropped.

Made with Cursor

Step 2.75 of the Overseer build sequence (prioritization §6, ADR-001).

- Captured-event-stream loader: parses a synthetic snapshot (sessions,
  events, event_links, baseline inbox items, dispatch envelopes, worker
  messages) and replays it into a sandbox :memory: Store - never touches
  the production DB.
- Run-once promotion + prioritization entry point invokable against a
  snapshot (runPromotionPass).
- 12 golden scenarios from the §6 table: routine-progress surfaces
  nothing, same-session collapse with merged source_event_ids, idempotent
  re-emission, blocked_by fan-in root-cause traversal, approval
  escalation, stale-item aging + EEMUA-191 KPIs (alarm-flood / stale /
  priority-distribution), completed+PR review item, completed-noise falls
  out, CI/worker contradiction surfaced-not-resolved, operator noise
  demotion. Plus the +1: hub-inferred stale silence is captured-only
  (locks fix/overseer-inbox-stale-noise), worker self-reported stalled
  promotes.
- One-boss invariant (ADR-001): scans dispatched events, asserts the
  worker message is operator-attributed with no Overseer metadata or
  attribution boilerplate. Passes vacuously now (no dispatches); clean +
  leak fixtures prove the assertion shape activates at Step 4.
- Dedicated CI gate (.github/workflows/overseer-replay.yml) path-filtered
  to Overseer logic / inbox scoring / event taxonomy / worker-emission
  contract; runs the harness on every matching PR.
- Fixtures are synthetic (contracts §7), never real transcripts.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant