fix(tts): #72 — phrase prosody volume must be %, not dB (root cause)#106
Merged
Conversation
While running the delivery-003 corpus regen, 6 of 8 elephant Tier B scenes reliably hit #72 (`Azure SSML parsing error 0x80045003`). Bisected the failing SSML and isolated the trigger: <prosody volume="+5%"> ← outer, from _volume_to_string text <prosody volume="+3dB">stress</prosody> ← inner, from _HINT_DEFAULTS text </prosody> Confirmed against Azure with 9 A/B SSML tests: - nested word-aligned, all-% units → OK - nested word-aligned, inner pitch="+1st"+vol% → OK - nested word-aligned, inner pitch=%+vol="+3dB" → FAIL - nested mid-word, mixed units → FAIL - mid-word <break /> (no nested prosody) → OK Pitch unit mismatch (`+1st` inner inside `+N%` outer) is tolerated; volume unit mismatch (`+NdB` inside `+N%`) is the trigger. Fix: `_HINT_DEFAULTS["stress"]["volume"]` changed from `"+3dB"` to `"+3%"`. This matches the lossy 1:1 dB→% mapping convention that `_volume_to_string` already uses, so the inner and outer prosody elements live in the same unit system. Two regression tests added: 1. `test_no_hint_default_uses_db_for_volume` — structural check that no entry in `_HINT_DEFAULTS` emits volume in `dB`, since the outer emitter is always `%`. 2. `test_hint_default_volumes_parse_as_percent` — companion: any volume default must end in `%` and parse as numeric. Updates the `PhraseProsody.volume` docstring to explain the invariant and reference #72. Reliable repro from the delivery-003 attempt: any elephant Tier B scene with intensity ≥ 3 (where the LLM emits `stress` hints on aggressive BEN turns) hits this; the failing scene/turn manifest is captured in `/tmp/ssml-diag/intercept_call_01.{xml,status}` during investigation. After this fix, re-running those 6 scenes succeeds. Test plan: - `pytest tests/unit/` — 1696 passed (1694 + 2 new) - `ruff check synthbanshee/ tests/` — clean - Manual Azure round-trip with TEST H (nested, vol=%) confirms the fix on live Azure. Refs #72. Unblocks delivery-003 corpus PR (avdp-synth-corpus). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes the root cause of Azure SSML parse failures (#72) by ensuring phrase-level prosody volume defaults use % units (matching the outer <prosody volume="..."> emitter), preventing invalid nested unit combinations.
Changes:
- Update
_HINT_DEFAULTS["stress"]["volume"]from"+3dB"to"+3%"to avoid Azure nested<prosody>volume unit mismatch. - Clarify
PhraseProsody.volumedocstring to document the%-only constraint and link the Azure failure mode (#72). - Strengthen unit tests to pin the “no dB volumes in hint defaults” invariant and update existing stress-default assertion.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
synthbanshee/tts/ssml_types.py |
Switch stress hint default volume to % and document the %-only nesting constraint to prevent Azure SSML errors. |
tests/unit/test_phrase_prosody.py |
Update stress default expectation and add structural tests enforcing % volume units in _HINT_DEFAULTS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #106 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.Run metadata: |
Merged
7 tasks
shaypal5
added a commit
to DataHackIL/avdp-synth-corpus
that referenced
this pull request
May 12, 2026
* feat(delivery-003): 20-clip multi-project toy corpus on synthbanshee main Replaces delivery 002. First handoff target for the She-Proves and Elephant consumer teams. ## Contents - **She-Proves Tier A — Azure pair (10 clips)** in `agg_m_30-45_001/`: 2 IT, 2 SV, 3 NEG, 3 NEU (Avri + Hila). - **She-Proves Tier A — Google Chirp HD pair (2 clips)** in `agg_m_30-45_002/`: 1 IT, 1 SV (sister scenes to sp_*_a_0001, authored as PR DataHackIL/SynthBanshee#105). Provides the voice + backend diversity vehicle for this delivery. - **Elephant Tier B (8 clips)** in `ben_m_40-55_003/`: 2 each of IT/SV/NEG/NEU with `acoustic_scene` (clinic_office room IR + pi_budget_mic device + HVAC ambient). Total: 20 clips, ~41.7 min. All pass `synthbanshee validate` and `synthbanshee qa-report` (failure rate 0.0%). Full QA snapshot at [`deliveries/003-multi-project-multi-voice/qa-report.json`](deliveries/003-multi-project-multi-voice/qa-report.json). ## Pipeline corrections delivered This delivery is the first to surface 4 synthbanshee fixes landed in the past day: - DataHackIL/SynthBanshee#102 — `preprocessing_applied.normalized_dbfs` now records the *measured* post-preprocess peak (was hardcoded `-1.0`). Pair with `generation_metadata.loudness_target_peak_dbfs` to diagnose loudness drift; the schema docstring at `labels/schema.py:175` pins the measured-vs-target split. - DataHackIL/SynthBanshee#103 — `docs/spec.md` pins the `has_violence` derivation rule (`any(e.tier1_category != "NONE")`), adds the §2.5 identifier-casing table, rewrites §5.1 field notes. - DataHackIL/SynthBanshee#105 — adds `sp_sv_a_0003` + `sp_it_a_0003` Google-pair shadow scenes. - DataHackIL/SynthBanshee#106 — root cause for #72: `_HINT_DEFAULTS` was emitting nested `<prosody volume="+NdB">` inside outer `<prosody volume="+N%">`, which Azure rejects with SSML parse error 0x80045003. Required to unblock 6 of 8 elephant Tier B scenes; without the fix, every scene whose LLM script carries a `stress` phrase hint at intensity ≥ 3 failed reliably. ## Doc updates in this PR - `README.md`: tightened "Clip ID and filename conventions" to point at SynthBanshee `docs/spec.md` §2.5; rewrote the `has_violence` paragraph to the events-based rule; updated the audio-format section to the measured-vs-target split; replaced the v1-limitations block with a pointer to per-delivery notes. - `CLAUDE.md`: replaced the wrong `has_violence` formula with the events-based rule; expanded the audio-format table to match the spec's measured-vs-target distinction. - `DELIVERIES.md`: delivery 002 marked `superseded`; new row for 003. - `deliveries/003-multi-project-multi-voice/`: - `metadata.yaml` — structured delivery record. - `notes.md` — full per-clip table, voice/backend matrix, closed-vs-open qa-report findings. - `qa-report.json` — raw qa-report output (committed for audit). ## QA snapshot Closed since delivery 002: | Finding | 002 | 003 | |---|---|---| | `agg_no_escalation` | 3 clips | 0 | | `warn_no_overlap` | 4 clips | 0 (overlap_ratio 100% on I4+) | | `warn_emotion_downgrade` | 4 clips | 0 | | `generation_metadata` absent | 0 of 8 had it | 20 of 20 have it | | `dirty_file_path` null | 7 of 8 | 0 of 20 | | `normalized_dbfs` hardcoded `-1.0` | 8 of 8 | fixed (#102) | Still open: `low_voice_diversity_*` (now 2 voices per gender, threshold is ≥3 — partial progress 1 → 2); `single_backend` (misleading; see notes for explanation of the hardcoded `tts_engine` labeling bug); `vic_f0_high` on the 2 Google Chirp HD female-voice clips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Reliable root cause for the long-standing intermittent #72:
Azure SSML parsing error 0x80045003 / Connection was closed by the remote host.Reproduced during the delivery-003 corpus regen (
avdp-synth-corpus, on top of recent PRs #102/#103/#105): 6 of 8 elephant Tier B scenes fail every time on the first uncached turn. Bisected with 9 hand-crafted A/B SSML round-trips against live Azure.What triggers it
When the M2b phrase-prosody system fires a
stresshint, the renderer produces nested<prosody>with the inner volume indBand the outer in%:Azure rejects volume unit mismatch (
dBinner under%outer). Pitch unit mismatch (stunder%) is tolerated. Mid-word<break />is tolerated. Both confirmed against live Azure.Fix
_HINT_DEFAULTS["stress"]["volume"]: "+3dB"→"+3%". Matches the lossy 1:1 dB→% mapping that_volume_to_stringalready uses for the outer prosody. No SSML-builder code changes — just the hint default + docstring update.A/B isolation results (live Azure)
vol="+5%"vol="+3dB", mid-wordvol="+5%"<break />(no nested prosody)vol="+5%"vol="+5%"vol="+3dB"vol="+5%"vol="+3%"pitch="+1st", no inner volumevol="+5%"pitch="+1st"pitch="+6%",volume="+3dB"vol="+5%"vol="+3dB"pitch="+1st",volume="+3%"vol="+5%"vol="+3%"Volume
dBinside volume%is the trigger; nothing else.Files changed
synthbanshee/tts/ssml_types.py_HINT_DEFAULTS["stress"]["volume"]: "+3dB"→"+3%";PhraseProsody.volumedocstring updated to pin the%-only constraint and reference #72tests/unit/test_phrase_prosody.pytest_hint_defaults_applied_stressupdated to assert"+3%"; new classTestHintDefaultUnits— two structural tests pinning the "nodBfor volume" / "must end in%" invariant across_HINT_DEFAULTSTest plan
pytest tests/unit/— 1696 passed (1694 prior + 2 new structural tests)ruff check synthbanshee/ tests/— cleanTier-3 ASR sanity (local)
This change will alter audio output for any scene that emits a
stressphrase hint — the inner prosody volume drops from a real +3 dB (~+41% linear) to the lossy synthbanshee convention of +3% (matching how the outer prosody has always been emitted). PerCLAUDE.md's ASR sanity policy, this is in-scope; will runqa-report --asron the delivery-003 corpus once those scenes regenerate and paste the result into the upcoming corpus PR.Unblocks
avdp-synth-corpusdelivery-003 (in-flight) — the 6 elephant Tier B scenes that hit this can now regenerate cleanly.Refs #72.
🤖 Generated with Claude Code