Skip to content

bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown #72

@shaypal5

Description

@shaypal5

Status

Unable to reproduce as of 2026-05-04. All 4 previously-failing scenes now generate successfully. The original error may have been a transient Azure service issue, or may recur under different conditions.

Original symptoms

  • 4 of 8 She-Proves Tier A scenes failed deterministically with Azure TTS SSML parsing error 0x80045003
  • Failing: sp_it_a_0002, sp_sv_a_0001, sp_sv_a_0002, sp_neg_a_0002
  • Succeeding: sp_it_a_0001, sp_neg_a_0001, sp_neu_a_0001, sp_neu_a_0002
  • Error occurred on specific turns mid-render (earlier turns cached successfully)
  • Error message: ResultReason.Canceled / Connection was closed by the remote host. Error code: 1007. SSML parsing error: 0x80045003

Investigation performed (PR #71)

Hypotheses tested and disproven

  1. Adjacent <break> elements — PR fix(tts): insert inter-word <break> tags to prevent Hebrew word merging #70's inter-word breaks creating adjacent breaks with phrase prosody break_before_ms. Disproven: sent SSML with adjacent breaks directly to Azure, synthesized successfully.

  2. Prosody values out of Azure's range — calculated worst-case values for AGG_M_30-45_001 at I5 with max state drift: pitch +21%, rate +38%, volume +17%. All within Azure's documented limits (pitch ±50%, rate -50%/+200%).

  3. SSML element count — counted worst-case: ~85 elements for a long turn with phrase prosody. Well under Azure's 400-element limit.

  4. Text content — tested actual Hebrew turns from the failing scene transcript (including I5 turns with niqqud, menace hints, accumulated state drift) against Azure. All succeeded.

Full pipeline reproduction attempt

Ran generate CLI on all 4 failing scenes with current code:

sp_sv_a_0001: 14/14 turns rendered ✓
sp_it_a_0002: 14/14 turns rendered ✓
sp_sv_a_0002: 12/12 turns rendered ✓ (truncated output but clip generated)
sp_neg_a_0002: 14/14 turns rendered ✓

All produced valid clips. Script cache was hit (same LLM text as before). No SSML errors.

Defensive hardening applied (PR #71, merged)

While unable to reproduce, the following hardening was applied:

  • Break merging: word-boundary breaks (50ms) are replaced by longer phrase breaks; semantic breaks are summed. Prevents a theoretical adjacent-break rejection.
  • Prosody clamping with warnings: values exceeding Azure's documented ranges are clamped, with logging.warning emitted so upstream config bugs become visible.
  • Text sanitization: XML 1.0 invalid characters stripped before SSML building (defense-in-depth).

Remaining theories

If the error recurs, investigate:

  1. Azure service degradation — error 1007 (WebSocket close: invalid payload) may indicate server-side parsing timeout on high load. Check Azure status page and retry with exponential backoff.

  2. SDK connection reuse — the Speech SDK reuses WebSocket connections. A previous synthesis leaving the connection in a partially-consumed state could cause the next request to fail with a parsing error. Test: create a fresh SpeechSynthesizer per turn instead of reusing.

  3. Region-specific behavior — the eastus endpoint may behave differently from other regions. The error might be reproducible on a different region or during peak hours.

  4. SSML size threshold — although individual turns are small (~800 chars), there may be a cumulative session limit. The mid-render failure pattern (earlier turns succeed) supports this theory.

  5. Rate limiting — Azure may return a parse error when rate-limited rather than a proper 429. The deterministic failure (same turns) could be explained by cached early turns not hitting the API while later turns do.

Action items if error recurs

  • Capture the exact SSML string that fails (add _log.debug("SSML: %s", ssml) before provider.synthesize())
  • Test the captured SSML in isolation (single fresh synthesizer, no prior calls)
  • Test with a fresh SpeechSynthesizer per turn (disable connection reuse)
  • Test on a different Azure region
  • Check Azure service health at time of failure
  • Add retry with backoff in AzureProvider.synthesize()

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcomp: ttsTTS rendering, SSML, Azure/Google providers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions