You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unable to reproduce as of 2026-05-04. All 4 previously-failing scenes now generate successfully. The original error may have been a transient Azure service issue, or may recur under different conditions.
Original symptoms
4 of 8 She-Proves Tier A scenes failed deterministically with Azure TTS SSML parsing error 0x80045003
Prosody values out of Azure's range — calculated worst-case values for AGG_M_30-45_001 at I5 with max state drift: pitch +21%, rate +38%, volume +17%. All within Azure's documented limits (pitch ±50%, rate -50%/+200%).
SSML element count — counted worst-case: ~85 elements for a long turn with phrase prosody. Well under Azure's 400-element limit.
Text content — tested actual Hebrew turns from the failing scene transcript (including I5 turns with niqqud, menace hints, accumulated state drift) against Azure. All succeeded.
Full pipeline reproduction attempt
Ran generate CLI on all 4 failing scenes with current code:
While unable to reproduce, the following hardening was applied:
Break merging: word-boundary breaks (50ms) are replaced by longer phrase breaks; semantic breaks are summed. Prevents a theoretical adjacent-break rejection.
Prosody clamping with warnings: values exceeding Azure's documented ranges are clamped, with logging.warning emitted so upstream config bugs become visible.
Text sanitization: XML 1.0 invalid characters stripped before SSML building (defense-in-depth).
Remaining theories
If the error recurs, investigate:
Azure service degradation — error 1007 (WebSocket close: invalid payload) may indicate server-side parsing timeout on high load. Check Azure status page and retry with exponential backoff.
SDK connection reuse — the Speech SDK reuses WebSocket connections. A previous synthesis leaving the connection in a partially-consumed state could cause the next request to fail with a parsing error. Test: create a fresh SpeechSynthesizer per turn instead of reusing.
Region-specific behavior — the eastus endpoint may behave differently from other regions. The error might be reproducible on a different region or during peak hours.
SSML size threshold — although individual turns are small (~800 chars), there may be a cumulative session limit. The mid-render failure pattern (earlier turns succeed) supports this theory.
Rate limiting — Azure may return a parse error when rate-limited rather than a proper 429. The deterministic failure (same turns) could be explained by cached early turns not hitting the API while later turns do.
Action items if error recurs
Capture the exact SSML string that fails (add _log.debug("SSML: %s", ssml) before provider.synthesize())
Test the captured SSML in isolation (single fresh synthesizer, no prior calls)
Test with a fresh SpeechSynthesizer per turn (disable connection reuse)
Test on a different Azure region
Check Azure service health at time of failure
Add retry with backoff in AzureProvider.synthesize()
Status
Unable to reproduce as of 2026-05-04. All 4 previously-failing scenes now generate successfully. The original error may have been a transient Azure service issue, or may recur under different conditions.
Original symptoms
0x80045003sp_it_a_0002,sp_sv_a_0001,sp_sv_a_0002,sp_neg_a_0002sp_it_a_0001,sp_neg_a_0001,sp_neu_a_0001,sp_neu_a_0002ResultReason.Canceled / Connection was closed by the remote host. Error code: 1007. SSML parsing error: 0x80045003Investigation performed (PR #71)
Hypotheses tested and disproven
Adjacent
<break>elements — PR fix(tts): insert inter-word <break> tags to prevent Hebrew word merging #70's inter-word breaks creating adjacent breaks with phrase prosodybreak_before_ms. Disproven: sent SSML with adjacent breaks directly to Azure, synthesized successfully.Prosody values out of Azure's range — calculated worst-case values for AGG_M_30-45_001 at I5 with max state drift: pitch +21%, rate +38%, volume +17%. All within Azure's documented limits (pitch ±50%, rate -50%/+200%).
SSML element count — counted worst-case: ~85 elements for a long turn with phrase prosody. Well under Azure's 400-element limit.
Text content — tested actual Hebrew turns from the failing scene transcript (including I5 turns with niqqud, menace hints, accumulated state drift) against Azure. All succeeded.
Full pipeline reproduction attempt
Ran
generateCLI on all 4 failing scenes with current code:All produced valid clips. Script cache was hit (same LLM text as before). No SSML errors.
Defensive hardening applied (PR #71, merged)
While unable to reproduce, the following hardening was applied:
logging.warningemitted so upstream config bugs become visible.Remaining theories
If the error recurs, investigate:
Azure service degradation — error 1007 (WebSocket close: invalid payload) may indicate server-side parsing timeout on high load. Check Azure status page and retry with exponential backoff.
SDK connection reuse — the Speech SDK reuses WebSocket connections. A previous synthesis leaving the connection in a partially-consumed state could cause the next request to fail with a parsing error. Test: create a fresh
SpeechSynthesizerper turn instead of reusing.Region-specific behavior — the
eastusendpoint may behave differently from other regions. The error might be reproducible on a different region or during peak hours.SSML size threshold — although individual turns are small (~800 chars), there may be a cumulative session limit. The mid-render failure pattern (earlier turns succeed) supports this theory.
Rate limiting — Azure may return a parse error when rate-limited rather than a proper 429. The deterministic failure (same turns) could be explained by cached early turns not hitting the API while later turns do.
Action items if error recurs
_log.debug("SSML: %s", ssml)beforeprovider.synthesize())SpeechSynthesizerper turn (disable connection reuse)AzureProvider.synthesize()Related
<break>injection (the suspected trigger, but not confirmed)