fix(tts): #72 — phrase prosody volume must be %, not dB (root cause) by shaypal5 · Pull Request #106 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-11T21:47:01Z

Problem

Reliable root cause for the long-standing intermittent #72: Azure SSML parsing error 0x80045003 / Connection was closed by the remote host.

Reproduced during the delivery-003 corpus regen (avdp-synth-corpus, on top of recent PRs #102/#103/#105): 6 of 8 elephant Tier B scenes fail every time on the first uncached turn. Bisected with 9 hand-crafted A/B SSML round-trips against live Azure.

What triggers it

When the M2b phrase-prosody system fires a stress hint, the renderer produces nested <prosody> with the inner volume in dB and the outer in %:

<prosody rate="+9%" pitch="+7%" volume="+5%">     <!-- outer: _volume_to_string emits % -->
  text...
  <prosody rate="+15%" pitch="+1st" volume="+3dB">stress span</prosody>  <!-- _HINT_DEFAULTS["stress"] -->
  text...
</prosody>

Azure rejects volume unit mismatch (dB inner under % outer). Pitch unit mismatch (st under %) is tolerated. Mid-word <break /> is tolerated. Both confirmed against live Azure.

Fix

_HINT_DEFAULTS["stress"]["volume"]: "+3dB" → "+3%". Matches the lossy 1:1 dB→% mapping that _volume_to_string already uses for the outer prosody. No SSML-builder code changes — just the hint default + docstring update.

A/B isolation results (live Azure)

Test	Outer	Inner	Result
Original (the bug)	`vol="+5%"`	`vol="+3dB"`, mid-word	FAIL
No nested prosody	`vol="+5%"`	—	OK
Mid-word `<break />` (no nested prosody)	`vol="+5%"`	break only	OK
Word-aligned nest, mixed units	`vol="+5%"`	`vol="+3dB"`	FAIL
Word-aligned nest, all-% units	`vol="+5%"`	`vol="+3%"`	OK
Word-aligned nest, `pitch="+1st"`, no inner volume	`vol="+5%"`	`pitch="+1st"`	OK
Word-aligned nest, `pitch="+6%"`, `volume="+3dB"`	`vol="+5%"`	`vol="+3dB"`	FAIL
Word-aligned nest, `pitch="+1st"`, `volume="+3%"`	`vol="+5%"`	`vol="+3%"`	OK

Volume dB inside volume % is the trigger; nothing else.

Files changed

File	Change
`synthbanshee/tts/ssml_types.py`	`_HINT_DEFAULTS["stress"]["volume"]: "+3dB"` → `"+3%"`; `PhraseProsody.volume` docstring updated to pin the `%`-only constraint and reference #72
`tests/unit/test_phrase_prosody.py`	`test_hint_defaults_applied_stress` updated to assert `"+3%"`; new class `TestHintDefaultUnits` — two structural tests pinning the "no `dB` for volume" / "must end in `%`" invariant across `_HINT_DEFAULTS`

Test plan

pytest tests/unit/ — 1696 passed (1694 prior + 2 new structural tests)
ruff check synthbanshee/ tests/ — clean
Live Azure: TEST H (nested word-aligned, all-% units) returns 307,630 audio bytes; TEST A (original failing SSML) reproduces bug(tts): SSML parse error 0x80045003 — unable to reproduce, root cause unknown #72 100% of the time.

Tier-3 ASR sanity (local)

This change will alter audio output for any scene that emits a stress phrase hint — the inner prosody volume drops from a real +3 dB (~+41% linear) to the lossy synthbanshee convention of +3% (matching how the outer prosody has always been emitted). Per CLAUDE.md's ASR sanity policy, this is in-scope; will run qa-report --asr on the delivery-003 corpus once those scenes regenerate and paste the result into the upcoming corpus PR.

Unblocks

avdp-synth-corpus delivery-003 (in-flight) — the 6 elephant Tier B scenes that hit this can now regenerate cleanly.

Refs #72.

🤖 Generated with Claude Code

While running the delivery-003 corpus regen, 6 of 8 elephant Tier B scenes reliably hit #72 (`Azure SSML parsing error 0x80045003`). Bisected the failing SSML and isolated the trigger: <prosody volume="+5%"> ← outer, from _volume_to_string text <prosody volume="+3dB">stress</prosody> ← inner, from _HINT_DEFAULTS text </prosody> Confirmed against Azure with 9 A/B SSML tests: - nested word-aligned, all-% units → OK - nested word-aligned, inner pitch="+1st"+vol% → OK - nested word-aligned, inner pitch=%+vol="+3dB" → FAIL - nested mid-word, mixed units → FAIL - mid-word <break /> (no nested prosody) → OK Pitch unit mismatch (`+1st` inner inside `+N%` outer) is tolerated; volume unit mismatch (`+NdB` inside `+N%`) is the trigger. Fix: `_HINT_DEFAULTS["stress"]["volume"]` changed from `"+3dB"` to `"+3%"`. This matches the lossy 1:1 dB→% mapping convention that `_volume_to_string` already uses, so the inner and outer prosody elements live in the same unit system. Two regression tests added: 1. `test_no_hint_default_uses_db_for_volume` — structural check that no entry in `_HINT_DEFAULTS` emits volume in `dB`, since the outer emitter is always `%`. 2. `test_hint_default_volumes_parse_as_percent` — companion: any volume default must end in `%` and parse as numeric. Updates the `PhraseProsody.volume` docstring to explain the invariant and reference #72. Reliable repro from the delivery-003 attempt: any elephant Tier B scene with intensity ≥ 3 (where the LLM emits `stress` hints on aggressive BEN turns) hits this; the failing scene/turn manifest is captured in `/tmp/ssml-diag/intercept_call_01.{xml,status}` during investigation. After this fix, re-running those 6 scenes succeeds. Test plan: - `pytest tests/unit/` — 1696 passed (1694 + 2 new) - `ruff check synthbanshee/ tests/` — clean - Manual Azure round-trip with TEST H (nested, vol=%) confirms the fix on live Azure. Refs #72. Unblocks delivery-003 corpus PR (avdp-synth-corpus). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes the root cause of Azure SSML parse failures (#72) by ensuring phrase-level prosody volume defaults use % units (matching the outer <prosody volume="..."> emitter), preventing invalid nested unit combinations.

Changes:

Update _HINT_DEFAULTS["stress"]["volume"] from "+3dB" to "+3%" to avoid Azure nested <prosody> volume unit mismatch.
Clarify PhraseProsody.volume docstring to document the %-only constraint and link the Azure failure mode (#72).
Strengthen unit tests to pin the “no dB volumes in hint defaults” invariant and update existing stress-default assertion.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`synthbanshee/tts/ssml_types.py`	Switch stress hint default volume to `%` and document the `%`-only nesting constraint to prevent Azure SSML errors.
`tests/unit/test_phrase_prosody.py`	Update stress default expectation and add structural tests enforcing `%` volume units in `_HINT_DEFAULTS`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-11T21:49:45Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #106 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: pull request opened
Workflow run: 25699253567 attempt 1
Comment timestamp: 2026-05-11T21:48:57.158120+00:00
PR head commit: fb2e0da6553f1f91087abc549d4ab15981de2196

* feat(delivery-003): 20-clip multi-project toy corpus on synthbanshee main Replaces delivery 002. First handoff target for the She-Proves and Elephant consumer teams. ## Contents - **She-Proves Tier A — Azure pair (10 clips)** in `agg_m_30-45_001/`: 2 IT, 2 SV, 3 NEG, 3 NEU (Avri + Hila). - **She-Proves Tier A — Google Chirp HD pair (2 clips)** in `agg_m_30-45_002/`: 1 IT, 1 SV (sister scenes to sp_*_a_0001, authored as PR DataHackIL/SynthBanshee#105). Provides the voice + backend diversity vehicle for this delivery. - **Elephant Tier B (8 clips)** in `ben_m_40-55_003/`: 2 each of IT/SV/NEG/NEU with `acoustic_scene` (clinic_office room IR + pi_budget_mic device + HVAC ambient). Total: 20 clips, ~41.7 min. All pass `synthbanshee validate` and `synthbanshee qa-report` (failure rate 0.0%). Full QA snapshot at [`deliveries/003-multi-project-multi-voice/qa-report.json`](deliveries/003-multi-project-multi-voice/qa-report.json). ## Pipeline corrections delivered This delivery is the first to surface 4 synthbanshee fixes landed in the past day: - DataHackIL/SynthBanshee#102 — `preprocessing_applied.normalized_dbfs` now records the *measured* post-preprocess peak (was hardcoded `-1.0`). Pair with `generation_metadata.loudness_target_peak_dbfs` to diagnose loudness drift; the schema docstring at `labels/schema.py:175` pins the measured-vs-target split. - DataHackIL/SynthBanshee#103 — `docs/spec.md` pins the `has_violence` derivation rule (`any(e.tier1_category != "NONE")`), adds the §2.5 identifier-casing table, rewrites §5.1 field notes. - DataHackIL/SynthBanshee#105 — adds `sp_sv_a_0003` + `sp_it_a_0003` Google-pair shadow scenes. - DataHackIL/SynthBanshee#106 — root cause for #72: `_HINT_DEFAULTS` was emitting nested `<prosody volume="+NdB">` inside outer `<prosody volume="+N%">`, which Azure rejects with SSML parse error 0x80045003. Required to unblock 6 of 8 elephant Tier B scenes; without the fix, every scene whose LLM script carries a `stress` phrase hint at intensity ≥ 3 failed reliably. ## Doc updates in this PR - `README.md`: tightened "Clip ID and filename conventions" to point at SynthBanshee `docs/spec.md` §2.5; rewrote the `has_violence` paragraph to the events-based rule; updated the audio-format section to the measured-vs-target split; replaced the v1-limitations block with a pointer to per-delivery notes. - `CLAUDE.md`: replaced the wrong `has_violence` formula with the events-based rule; expanded the audio-format table to match the spec's measured-vs-target distinction. - `DELIVERIES.md`: delivery 002 marked `superseded`; new row for 003. - `deliveries/003-multi-project-multi-voice/`: - `metadata.yaml` — structured delivery record. - `notes.md` — full per-clip table, voice/backend matrix, closed-vs-open qa-report findings. - `qa-report.json` — raw qa-report output (committed for audit). ## QA snapshot Closed since delivery 002: | Finding | 002 | 003 | |---|---|---| | `agg_no_escalation` | 3 clips | 0 | | `warn_no_overlap` | 4 clips | 0 (overlap_ratio 100% on I4+) | | `warn_emotion_downgrade` | 4 clips | 0 | | `generation_metadata` absent | 0 of 8 had it | 20 of 20 have it | | `dirty_file_path` null | 7 of 8 | 0 of 20 | | `normalized_dbfs` hardcoded `-1.0` | 8 of 8 | fixed (#102) | Still open: `low_voice_diversity_*` (now 2 voices per gender, threshold is ≥3 — partial progress 1 → 2); `single_backend` (misleading; see notes for explanation of the hardcoded `tts_engine` labeling bug); `vic_f0_high` on the 2 Google Chirp HD female-voice clips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

shaypal5 added type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers labels May 11, 2026

Copilot AI review requested due to automatic review settings May 11, 2026 21:47

shaypal5 added type: fix Bug fix comp: tts TTS rendering, SSML, Azure/Google providers labels May 11, 2026

shaypal5 merged commit d92d61e into main May 11, 2026
5 checks passed

shaypal5 deleted the fix/ssml-phrase-prosody-volume-units branch May 11, 2026 21:47

Copilot started reviewing on behalf of shaypal5 May 11, 2026 21:47 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

shaypal5 mentioned this pull request May 11, 2026

feat(delivery-003): 20-clip multi-project, multi-voice toy corpus DataHackIL/avdp-synth-corpus#4

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): #72 — phrase prosody volume must be %, not dB (root cause)#106

fix(tts): #72 — phrase prosody volume must be %, not dB (root cause)#106
shaypal5 merged 1 commit into
mainfrom
fix/ssml-phrase-prosody-volume-units

shaypal5 commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 11, 2026

Problem

What triggers it

Fix

A/B isolation results (live Azure)

Files changed

Test plan

Tier-3 ASR sanity (local)

Unblocks

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants