Skip to content

feat(hub): first-class spoken task-completion notifications with operator-configurable templates and substitutions #42

@heavygee

Description

@heavygee

Motivation

We currently have an out-of-process subscriber (heavygee/agent-notify-tmux:scripts/hapi-voice-subscriber.ts, see docs/hapi-subscriber.md) that consumes hub /api/events SSE, extracts task-completion contracts, and speaks them via the operator's local TTS endpoint. It works (15/15 tests, ~120 sessions seeded, two real announcements caught on first run). It is also strictly a fork-local hack against the public event surface, with no formal home and no documented contract.

This issue tracks the v1 path: promote the same capability to a first-class hub notification mode so any HAPI operator (not just this fork) can opt into spoken task notifications without writing their own SSE consumer.

Scope

A new NotificationChannel implementation in hub/src/notifications/ that speaks task notifications via TTS, plus the supporting plumbing:

  1. A registered VoiceNotificationChannel that implements sendTaskNotification(session, { summary, status }) and sendSessionCompletion(session, reason) from hub/src/notifications/notificationTypes.ts. Wired into the existing NotificationHub alongside Telegram and push.

  2. Operator-configurable spoken phrase built from named substitutions, NOT hard-coded text. The template is a settings-store string with {placeholder} slots that the channel expands at speech time. A missing-field tidy pass scrubs dangling punctuation so {action} being empty does not produce "agent reports , . , Sir.".

  3. First-class substitution variables, validated and surfaced in shared schemas (NOT a soft AGENTS.md convention). Proposed first cut:

    • {session} - session.metadata.name (the human-readable label shown in the web sidebar; what the operator looks up to find the agent)
    • {project} - inferred from session.metadata.path (last path segment / repo root) or operator override
    • {machine} - session.metadata.machineId resolved to the machine's display label
    • {agent} - the agent's self-reported identifier (carried in the task_notification event payload; promote to a typed field)
    • {status} - the spoken form of the contract status (done / blocked / needs review / needs decision / failed / stalled)
    • {summary} - the contract's summary field as authored by the agent
    • {action} - the contract's action field as authored by the agent (NEW; HAPI's current extractTaskNotification ignores it)
    • {model} - the agent's model identifier when known (some operators want to hear which model finished)
  4. Voice endpoint reuse. The hub already has voice plumbing in hub/src/web/routes/voice.ts and @hapi/protocol/voice for ElevenLabs ConvAI / Gemini Live / Qwen Realtime conversational sessions. Server-side speech for announcement is a different shape (one-shot /v1/audio/speech, not WebRTC), but the same backend resolution should drive both: resolveHubVoiceBackend(env) for the default backend, with operator overrides for backend / voice-id / model / response-format. Reusing the same configuration surface keeps the operator's mental model coherent ("my voice setup" is one thing, not two).

  5. Audio is per-user-session, not per-hub. The hub itself runs as a system service with no audio. Two viable shapes:

    • (A) Hub posts a "speak this" event onto a new socket/sse channel and any subscribed user-session daemon plays it. This generalises better (multi-machine; remote operators).
    • (B) Hub-side channel calls the configured TTS endpoint (which is itself a network service) and POSTs the resulting audio to a per-user delivery channel.
    • The current out-of-process subscriber implements (A) the simple way: ignore the hub channel and let each user-session subscriber make its own TTS calls. v1 should pick a side and document it.

First-class substitutions, not AGENTS.md prose

Today the AGENT_NOTIFY_SUMMARY contract lives only in ~/coding/AGENTS.md as instructions to the agent. HAPI's extractTaskNotification knows about summary and status but ignores action, agent, version. Promoting these to a typed schema in @hapi/protocol/notifications (or similar) gives:

  • Validation at hub ingest (reject malformed contracts, log gracefully).
  • A discoverable surface for new agent integrations.
  • Stable substitution names the operator's template can rely on.
  • Decoupling from any one fork's prompt convention.

Strawman shape (in shared/src/schemas.ts):

const TaskNotificationSchema = z.object({
    version: z.literal(1),
    status: z.enum(['done','blocked','needs_review','needs_decision','failed','stalled']),
    summary: z.string().min(1),
    action: z.string().optional(),
    agent: z.string().optional(),
})

The hub's existing extractTaskNotification continues to match <task-notification> XML and the system-output subtype:task_notification shapes, but additionally checks plaintext for an AGENT_NOTIFY_SUMMARY JSON line that validates against this schema (operator-fork-friendly path).

Acceptance criteria

  • VoiceNotificationChannel registered alongside Telegram + push in hub/src/notifications/, with a hub setting to enable/disable.
  • Operator-editable spoken template stored in hub settings (default phrase TBD; the agent-notify subscriber's current default is agent \"{session}\" reports {status}, {summary}. {action}, Sir. and can serve as the first-cut default).
  • All proposed substitutions resolved server-side and tested.
  • Voice backend resolution shares the existing resolveHubVoiceBackend plumbing.
  • Per-session cooldown + serial speech queue (port from agent-notify subscriber - prevents the audio-DDoS that ~25 parallel agents on this box would cause).
  • Hub-down behaviour is graceful (no leaked audio, no crashed channel).
  • Documentation: add an entry under docs/guide/ for "voice notifications" with template syntax + substitution table.

Non-goals

  • Replacing the conversational voice flow in hub/src/web/routes/voice.ts (that is two-way realtime, this is one-shot announcement).
  • Cursor IDE direct sessions (those never reach HAPI; covered by agent-notify-tmux's existing Cursor stop hook).
  • Speech-to-text on the operator side (separate work).

Reference / prior art

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions