Motivation
We currently have an out-of-process subscriber (heavygee/agent-notify-tmux:scripts/hapi-voice-subscriber.ts, see docs/hapi-subscriber.md) that consumes hub /api/events SSE, extracts task-completion contracts, and speaks them via the operator's local TTS endpoint. It works (15/15 tests, ~120 sessions seeded, two real announcements caught on first run). It is also strictly a fork-local hack against the public event surface, with no formal home and no documented contract.
This issue tracks the v1 path: promote the same capability to a first-class hub notification mode so any HAPI operator (not just this fork) can opt into spoken task notifications without writing their own SSE consumer.
Scope
A new NotificationChannel implementation in hub/src/notifications/ that speaks task notifications via TTS, plus the supporting plumbing:
-
A registered VoiceNotificationChannel that implements sendTaskNotification(session, { summary, status }) and sendSessionCompletion(session, reason) from hub/src/notifications/notificationTypes.ts. Wired into the existing NotificationHub alongside Telegram and push.
-
Operator-configurable spoken phrase built from named substitutions, NOT hard-coded text. The template is a settings-store string with {placeholder} slots that the channel expands at speech time. A missing-field tidy pass scrubs dangling punctuation so {action} being empty does not produce "agent reports , . , Sir.".
-
First-class substitution variables, validated and surfaced in shared schemas (NOT a soft AGENTS.md convention). Proposed first cut:
{session} - session.metadata.name (the human-readable label shown in the web sidebar; what the operator looks up to find the agent)
{project} - inferred from session.metadata.path (last path segment / repo root) or operator override
{machine} - session.metadata.machineId resolved to the machine's display label
{agent} - the agent's self-reported identifier (carried in the task_notification event payload; promote to a typed field)
{status} - the spoken form of the contract status (done / blocked / needs review / needs decision / failed / stalled)
{summary} - the contract's summary field as authored by the agent
{action} - the contract's action field as authored by the agent (NEW; HAPI's current extractTaskNotification ignores it)
{model} - the agent's model identifier when known (some operators want to hear which model finished)
-
Voice endpoint reuse. The hub already has voice plumbing in hub/src/web/routes/voice.ts and @hapi/protocol/voice for ElevenLabs ConvAI / Gemini Live / Qwen Realtime conversational sessions. Server-side speech for announcement is a different shape (one-shot /v1/audio/speech, not WebRTC), but the same backend resolution should drive both: resolveHubVoiceBackend(env) for the default backend, with operator overrides for backend / voice-id / model / response-format. Reusing the same configuration surface keeps the operator's mental model coherent ("my voice setup" is one thing, not two).
-
Audio is per-user-session, not per-hub. The hub itself runs as a system service with no audio. Two viable shapes:
- (A) Hub posts a "speak this" event onto a new socket/sse channel and any subscribed user-session daemon plays it. This generalises better (multi-machine; remote operators).
- (B) Hub-side channel calls the configured TTS endpoint (which is itself a network service) and POSTs the resulting audio to a per-user delivery channel.
- The current out-of-process subscriber implements (A) the simple way: ignore the hub channel and let each user-session subscriber make its own TTS calls. v1 should pick a side and document it.
First-class substitutions, not AGENTS.md prose
Today the AGENT_NOTIFY_SUMMARY contract lives only in ~/coding/AGENTS.md as instructions to the agent. HAPI's extractTaskNotification knows about summary and status but ignores action, agent, version. Promoting these to a typed schema in @hapi/protocol/notifications (or similar) gives:
- Validation at hub ingest (reject malformed contracts, log gracefully).
- A discoverable surface for new agent integrations.
- Stable substitution names the operator's template can rely on.
- Decoupling from any one fork's prompt convention.
Strawman shape (in shared/src/schemas.ts):
const TaskNotificationSchema = z.object({
version: z.literal(1),
status: z.enum(['done','blocked','needs_review','needs_decision','failed','stalled']),
summary: z.string().min(1),
action: z.string().optional(),
agent: z.string().optional(),
})
The hub's existing extractTaskNotification continues to match <task-notification> XML and the system-output subtype:task_notification shapes, but additionally checks plaintext for an AGENT_NOTIFY_SUMMARY JSON line that validates against this schema (operator-fork-friendly path).
Acceptance criteria
Non-goals
- Replacing the conversational voice flow in
hub/src/web/routes/voice.ts (that is two-way realtime, this is one-shot announcement).
- Cursor IDE direct sessions (those never reach HAPI; covered by
agent-notify-tmux's existing Cursor stop hook).
- Speech-to-text on the operator side (separate work).
Reference / prior art
Motivation
We currently have an out-of-process subscriber (
heavygee/agent-notify-tmux:scripts/hapi-voice-subscriber.ts, see docs/hapi-subscriber.md) that consumes hub/api/eventsSSE, extracts task-completion contracts, and speaks them via the operator's local TTS endpoint. It works (15/15 tests, ~120 sessions seeded, two real announcements caught on first run). It is also strictly a fork-local hack against the public event surface, with no formal home and no documented contract.This issue tracks the v1 path: promote the same capability to a first-class hub notification mode so any HAPI operator (not just this fork) can opt into spoken task notifications without writing their own SSE consumer.
Scope
A new
NotificationChannelimplementation inhub/src/notifications/that speaks task notifications via TTS, plus the supporting plumbing:A registered
VoiceNotificationChannelthat implementssendTaskNotification(session, { summary, status })andsendSessionCompletion(session, reason)fromhub/src/notifications/notificationTypes.ts. Wired into the existingNotificationHubalongside Telegram and push.Operator-configurable spoken phrase built from named substitutions, NOT hard-coded text. The template is a settings-store string with
{placeholder}slots that the channel expands at speech time. A missing-field tidy pass scrubs dangling punctuation so{action}being empty does not produce "agent reports , . , Sir.".First-class substitution variables, validated and surfaced in shared schemas (NOT a soft AGENTS.md convention). Proposed first cut:
{session}-session.metadata.name(the human-readable label shown in the web sidebar; what the operator looks up to find the agent){project}- inferred fromsession.metadata.path(last path segment / repo root) or operator override{machine}-session.metadata.machineIdresolved to the machine's display label{agent}- the agent's self-reported identifier (carried in thetask_notificationevent payload; promote to a typed field){status}- the spoken form of the contract status (done/blocked/needs review/needs decision/failed/stalled){summary}- the contract'ssummaryfield as authored by the agent{action}- the contract'sactionfield as authored by the agent (NEW; HAPI's currentextractTaskNotificationignores it){model}- the agent's model identifier when known (some operators want to hear which model finished)Voice endpoint reuse. The hub already has voice plumbing in
hub/src/web/routes/voice.tsand@hapi/protocol/voicefor ElevenLabs ConvAI / Gemini Live / Qwen Realtime conversational sessions. Server-side speech for announcement is a different shape (one-shot/v1/audio/speech, not WebRTC), but the same backend resolution should drive both:resolveHubVoiceBackend(env)for the default backend, with operator overrides for backend / voice-id / model / response-format. Reusing the same configuration surface keeps the operator's mental model coherent ("my voice setup" is one thing, not two).Audio is per-user-session, not per-hub. The hub itself runs as a system service with no audio. Two viable shapes:
First-class substitutions, not AGENTS.md prose
Today the
AGENT_NOTIFY_SUMMARYcontract lives only in~/coding/AGENTS.mdas instructions to the agent. HAPI'sextractTaskNotificationknows aboutsummaryandstatusbut ignoresaction,agent,version. Promoting these to a typed schema in@hapi/protocol/notifications(or similar) gives:Strawman shape (in
shared/src/schemas.ts):The hub's existing
extractTaskNotificationcontinues to match<task-notification>XML and the system-outputsubtype:task_notificationshapes, but additionally checks plaintext for anAGENT_NOTIFY_SUMMARYJSON line that validates against this schema (operator-fork-friendly path).Acceptance criteria
VoiceNotificationChannelregistered alongside Telegram + push inhub/src/notifications/, with a hub setting to enable/disable.agent \"{session}\" reports {status}, {summary}. {action}, Sir.and can serve as the first-cut default).resolveHubVoiceBackendplumbing.docs/guide/for "voice notifications" with template syntax + substitution table.Non-goals
hub/src/web/routes/voice.ts(that is two-way realtime, this is one-shot announcement).agent-notify-tmux's existing Cursor stop hook).Reference / prior art
hub/src/notifications/eventParsing.ts(extractTaskNotification)hub/src/notifications/notificationTypes.ts