bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown

## Status

**Unable to reproduce** as of 2026-05-04. All 4 previously-failing scenes now generate successfully. The original error may have been a transient Azure service issue, or may recur under different conditions.

## Original symptoms

- 4 of 8 She-Proves Tier A scenes failed deterministically with Azure TTS SSML parsing error `0x80045003`
- Failing: `sp_it_a_0002`, `sp_sv_a_0001`, `sp_sv_a_0002`, `sp_neg_a_0002`
- Succeeding: `sp_it_a_0001`, `sp_neg_a_0001`, `sp_neu_a_0001`, `sp_neu_a_0002`
- Error occurred on specific turns mid-render (earlier turns cached successfully)
- Error message: `ResultReason.Canceled / Connection was closed by the remote host. Error code: 1007. SSML parsing error: 0x80045003`

## Investigation performed (PR #71)

### Hypotheses tested and disproven

1. **Adjacent `<break>` elements** — PR #70's inter-word breaks creating adjacent breaks with phrase prosody `break_before_ms`. **Disproven**: sent SSML with adjacent breaks directly to Azure, synthesized successfully.

2. **Prosody values out of Azure's range** — calculated worst-case values for AGG_M_30-45_001 at I5 with max state drift: pitch +21%, rate +38%, volume +17%. All within Azure's documented limits (pitch ±50%, rate -50%/+200%).

3. **SSML element count** — counted worst-case: ~85 elements for a long turn with phrase prosody. Well under Azure's 400-element limit.

4. **Text content** — tested actual Hebrew turns from the failing scene transcript (including I5 turns with niqqud, menace hints, accumulated state drift) against Azure. All succeeded.

### Full pipeline reproduction attempt

Ran `generate` CLI on all 4 failing scenes with current code:
```
sp_sv_a_0001: 14/14 turns rendered ✓
sp_it_a_0002: 14/14 turns rendered ✓
sp_sv_a_0002: 12/12 turns rendered ✓ (truncated output but clip generated)
sp_neg_a_0002: 14/14 turns rendered ✓
```

All produced valid clips. Script cache was hit (same LLM text as before). No SSML errors.

### Defensive hardening applied (PR #71, merged)

While unable to reproduce, the following hardening was applied:
- **Break merging**: word-boundary breaks (50ms) are replaced by longer phrase breaks; semantic breaks are summed. Prevents a theoretical adjacent-break rejection.
- **Prosody clamping with warnings**: values exceeding Azure's documented ranges are clamped, with `logging.warning` emitted so upstream config bugs become visible.
- **Text sanitization**: XML 1.0 invalid characters stripped before SSML building (defense-in-depth).

## Remaining theories

If the error recurs, investigate:

1. **Azure service degradation** — error 1007 (WebSocket close: invalid payload) may indicate server-side parsing timeout on high load. Check Azure status page and retry with exponential backoff.

2. **SDK connection reuse** — the Speech SDK reuses WebSocket connections. A previous synthesis leaving the connection in a partially-consumed state could cause the next request to fail with a parsing error. Test: create a fresh `SpeechSynthesizer` per turn instead of reusing.

3. **Region-specific behavior** — the `eastus` endpoint may behave differently from other regions. The error might be reproducible on a different region or during peak hours.

4. **SSML size threshold** — although individual turns are small (~800 chars), there may be a cumulative session limit. The mid-render failure pattern (earlier turns succeed) supports this theory.

5. **Rate limiting** — Azure may return a parse error when rate-limited rather than a proper 429. The deterministic failure (same turns) could be explained by cached early turns not hitting the API while later turns do.

## Action items if error recurs

- [ ] Capture the exact SSML string that fails (add `_log.debug("SSML: %s", ssml)` before `provider.synthesize()`)
- [ ] Test the captured SSML in isolation (single fresh synthesizer, no prior calls)
- [ ] Test with a fresh `SpeechSynthesizer` per turn (disable connection reuse)
- [ ] Test on a different Azure region
- [ ] Check Azure service health at time of failure
- [ ] Add retry with backoff in `AzureProvider.synthesize()`

## Related

- Original issue: #67
- Fix PR: #71 (merged — defensive hardening)
- PR #70: inter-word `<break>` injection (the suspected trigger, but not confirmed)
- PR #69: gender-aware Hebrew disambiguation (adds niqqud)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown #72

Status

Original symptoms

Investigation performed (PR #71)

Hypotheses tested and disproven

Full pipeline reproduction attempt

Defensive hardening applied (PR #71, merged)

Remaining theories

Action items if error recurs

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown #72

Description

Status

Original symptoms

Investigation performed (PR #71)

Hypotheses tested and disproven

Full pipeline reproduction attempt

Defensive hardening applied (PR #71, merged)

Remaining theories

Action items if error recurs

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions