fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend by shaypal5 · Pull Request #112 · DataHackIL/SynthBanshee

shaypal5 · 2026-05-12T06:56:22Z

Problem

ClipMetadata.tts_engine was always written as the literal string
"azure_he_IL" regardless of which TTS provider actually rendered the
clip. This mis-labeled every google-rendered clip and caused
qa-report's run-level single_backend warning to fire as a false
positive on mixed-provider runs.

Fingerprint from avdp-synth-corpus#4 (delivery-003):

$ jq '.tts_engine, .generation_metadata.tts_backend' \
    data/he/agg_m_30-45_002/sp_sv_a_0003_00.json
# → "azure_he_IL"                                          ← WRONG (this clip is Google)
# → {"AGG_M_30-45_002": "google", "VIC_F_25-40_003": "google"}  ← correct, per-speaker

And the matching qa-report.json:

"clips_by_tts_engine": {"azure_he_IL": 20},
"run_warnings": ["single_backend", ...]

single_backend is a false positive — the run genuinely has two
backends.

⚠️ Breaking changes

This PR is intentionally a schema and qa-output API break, scoped to
fix the bug correctly rather than perpetuate the dead-weight field. If
you have downstream consumers in avdp-synth-corpus or elsewhere, grep
them for the patterns below before pulling this in.

Surface	Before	After
`ClipMetadata.tts_engine` (Python attribute)	`str`, always `"azure_he_IL"`	Removed. Access raises `AttributeError`.
Clip JSON `"tts_engine"` field	always present	Not written. Legacy JSON parses fine (Pydantic `extra="ignore"`); the value is silently dropped on read.
`RunSummary.clips_by_tts_engine` (Python attribute)	`dict[str, int]` keyed on `"azure_he_IL"`	Renamed to `clips_by_tts_backend`, keyed on `"azure"` / `"google"` / `"unknown"`.
qa-report JSON key `"clips_by_tts_engine"`	always present	Renamed to `"clips_by_tts_backend"`.
Backend histogram value-space	`"azure_he_IL"`	`"azure"`, `"google"`, `"unknown"`
`sum(clips_by_tts_backend.values())` invariant	`== total_clips`	`>= total_clips` (a mixed-backend clip contributes +1 to each backend)

The rename is deliberate (not scope creep): keeping the old name
clips_by_tts_engine while changing the value-space from "azure_he_IL"
→ "azure" would have been a silent value-shape change. Renaming makes
both the value and key changes visible to downstream consumers in one
place. Anyone consuming the qa-report JSON will get a KeyError rather
than wrong counts under the old key — which is what we want.

Solution

Per the issue's recommended option (a): drop tts_engine as the
primary signal. The structured per-speaker generation_metadata.tts_backend
map already lives in clip JSON and carries richer information (per-speaker,
not per-clip); the flat field was dead weight.

Remove tts_engine from ClipMetadata entirely.
Drop the hardcoded tts_engine="azure_he_IL" line in
LabelGenerator.generate_clip_metadata.
In qa.py, derive backend diversity from
generation_metadata.tts_backend.values() instead of clip.tts_engine.
Rename RunSummary.clips_by_tts_engine → clips_by_tts_backend. A
clip with mixed providers contributes one count to each backend.
Bucket pre-bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA #109 clips (no generation_metadata) under "unknown"
so they remain visible to backend-diversity metrics. Without this, an
old corpus mixed with new clips would silently lose the old clips
from backend_count.
Update docs/spec.md §5.1 example + add a field note pointing readers
to generation_metadata.tts_backend.
Update AGENTS.md (= CLAUDE.md via symlink) to drop tts_engine
from the ascii-safe validator list.

Test plan

.venv/bin/pytest tests/unit/ -q — 1702/1702 passing
Explicit AttributeError test on both fresh and legacy-parsed models
"unknown" bucket coverage for pre-bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA #109 clips (two new qa tests)
End-to-end test: full mocked _run_generate_pipeline → run_qa, asserts "azure" lands in clips_by_tts_backend
ruff check synthbanshee/ tests/unit/ — clean
mypy synthbanshee/cli.py synthbanshee/package/qa.py synthbanshee/labels/schema.py synthbanshee/labels/generator.py — clean
Pre-commit hooks (ruff, ruff-format, mypy) — clean at commit time

Tier-3 ASR sanity (local)

Not applicable — this PR touches synthbanshee/labels/,
synthbanshee/package/qa.py, the qa-report rendering in
synthbanshee/cli.py, and docs. None of synthbanshee/tts/,
synthbanshee/script/, synthbanshee/augment/, nor any speaker / scene
/ acoustic / project YAML config is modified. Per CLAUDE.md "ASR sanity
check policy", this PR is exempt from the local qa-report --asr run.

Corpus impact

Existing on-disk corpus clips (delivery-003 included) still have the
wrong tts_engine value baked in. After this PR lands, the next regen
will simply drop the field. For a no-regen retrofit, a small follow-up
PR on avdp-synth-corpus can post-process the 21 affected JSONs (drop
tts_engine; derive any consumer logic from
generation_metadata.tts_backend). Otherwise, the new "unknown"
bucket keeps them visible to qa-report diversity metrics, and
Pydantic's extra="ignore" makes the stale field harmless on read.

Fixes #109.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR fixes incorrect backend labeling in clip metadata and QA reporting by removing the hardcoded per-clip tts_engine field and deriving backend diversity from the structured per-speaker generation_metadata.tts_backend map instead, eliminating false-positive single_backend warnings on mixed-provider runs.

Changes:

Remove ClipMetadata.tts_engine and stop writing the hardcoded "azure_he_IL" value during label generation.
Update QA aggregation + CLI output to use generation_metadata.tts_backend and rename clips_by_tts_engine → clips_by_tts_backend.
Update docs and unit tests/fixtures to reflect the schema and QA-report changes.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`synthbanshee/labels/schema.py`	Drops `tts_engine` from the schema and updates ASCII-safe validation fields.
`synthbanshee/labels/generator.py`	Stops emitting the hardcoded `tts_engine` value when generating metadata.
`synthbanshee/package/qa.py`	Computes backend diversity/histogram from `generation_metadata.tts_backend` and renames the run-summary field accordingly.
`synthbanshee/cli.py`	Updates qa-report Rich table and JSON output to use `clips_by_tts_backend`.
`docs/spec.md`	Removes `tts_engine` from the example and documents the new source of truth.
`AGENTS.md`	Updates the list of string fields enforced by the ASCII-safe validator.
`tests/unit/test_qa.py`	Updates clip fixtures to always emit `generation_metadata.tts_backend` and adds run-summary backend-diversity tests.
`tests/unit/test_labels.py`	Adds coverage asserting `tts_engine` is absent and legacy JSON still parses.
`tests/unit/test_manifest.py`	Removes `tts_engine` from JSON fixtures.
`tests/unit/test_generation_metadata.py`	Removes `tts_engine` from minimal-clip fixture.
`tests/unit/test_cli.py`	Removes `tts_engine` from minimal-clip fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nd from generation_metadata.tts_backend ClipMetadata.tts_engine was always written as the literal string "azure_he_IL" regardless of which TTS provider actually rendered the clip. This mis-labeled every google-rendered clip and caused qa-report's run-level `single_backend` warning to fire as a false positive on mixed-provider runs (see delivery-003 in avdp-synth-corpus#4, where 20 mixed-backend clips were tagged `"clips_by_tts_engine": {"azure_he_IL": 20}` and triggered `single_backend`). Per the issue's recommended option (a): drop `tts_engine` as the primary signal. The structured per-speaker `tts_backend` map already lives in `generation_metadata` and carries richer information; the flat field was dead weight. Changes: - synthbanshee/labels/schema.py: removed `tts_engine` field from `ClipMetadata`; removed from the ascii-safe `field_validator`. Old corpus JSON still parses cleanly (Pydantic ignores unknown fields). - synthbanshee/labels/generator.py: dropped the hardcoded `tts_engine="azure_he_IL"` line in `generate_clip_metadata`. - synthbanshee/package/qa.py: backend diversity now derived from `metadata.generation_metadata.tts_backend.values()`. `RunSummary` field renamed `clips_by_tts_engine` → `clips_by_tts_backend` (a clip with mixed providers contributes one count to each backend). - synthbanshee/cli.py: qa-report renderer + JSON output updated for the renamed field. - docs/spec.md §5.1: removed `"tts_engine": "azure_he_IL"` from the example JSON; added a field note pointing readers to `generation_metadata.tts_backend`. Old corpus snapshots still carrying the field should be tolerated and ignored. - AGENTS.md (= CLAUDE.md): removed `tts_engine` from the ascii-safe validator list. - tests/unit/test_qa.py: helper `_write_valid_clip` replaced `tts_engine` kwarg with `tts_backend` (per-speaker map); every clip now emits `generation_metadata` so qa derivation has a source of truth. Added two new tests under TestRunQARunSummary: `single_backend` does NOT fire when two clips use different backends; `single_backend` DOES fire when both clips are azure. Existing `test_run_summary_backend_diversity` updated for the new field name. - tests/unit/test_labels.py: added `test_tts_engine_field_removed` asserting the attribute is absent on freshly-built ClipMetadata and that legacy JSON carrying the old field still parses cleanly. - tests/unit/{test_cli,test_manifest,test_generation_metadata}.py: dropped `tts_engine` from minimal-clip JSON fixtures. Fixes #109. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…, document break Self-review caught five gaps in the first cut. This commit addresses them: 1. Pre-#109 clips (no ``generation_metadata``) now bucket under the ``"unknown"`` backend key in ``RunSummary.clips_by_tts_backend`` instead of silently contributing zero. Previously, a corpus mixing pre- and post-#109 clips would lose the legacy clips from ``backend_count``, making ``single_backend`` flip the wrong way. 2. Updated ``RunSummary.clips_by_tts_backend`` docstring to: - flag the ``sum(values) >= total_clips`` invariant change (mixed- backend clips contribute +1 to each backend, so the pre-#109 sum-equals-total invariant no longer holds); - document the ``"unknown"`` bucket for missing generation_metadata. 3. ``test_tts_engine_field_removed`` now explicitly exercises ``with pytest.raises(AttributeError): m.tts_engine`` for both fresh models and ones parsed from legacy JSON. The previous version only asserted ``not hasattr(...)``, which would silently pass if Pydantic ever shipped ``extra="allow"`` semantics by accident. The new test makes the breaking-change contract explicit. 4. New ``test_pipeline_output_feeds_qa_backend_derivation`` integration test runs the full mocked ``_run_generate_pipeline`` end-to-end and then feeds the resulting clip directory to ``run_qa(run_summary=True)``, asserting ``"azure"`` lands in ``clips_by_tts_backend`` and ``"unknown"`` does NOT. Catches the regression where the generator's ``_backends_map`` shape diverges from qa.py's expectation — both unit-tested components could pass in isolation while the integration silently breaks. 5. New ``test_run_summary_clips_without_generation_metadata_bucket_as_unknown`` and ``test_run_summary_all_legacy_clips_fire_single_backend`` pin the unknown-bucket contract: legacy clips remain visible to diversity metrics, and an all-legacy corpus still correctly fires ``single_backend``. Full unit suite: 1702/1702 passing (up from 1699 — three new tests). ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-12T15:42:26Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #112 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25745193064 attempt 1
Comment timestamp: 2026-05-12T15:41:36.349620+00:00
PR head commit: a14755dbe920e9a5f81686b2516ca6959133e053

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 12, 2026 06:56

shaypal5 added type: fix Bug fix comp: labels Label generation, schema, IAA utilities labels May 12, 2026

Copilot started reviewing on behalf of shaypal5 May 12, 2026 06:56 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread synthbanshee/package/qa.py Outdated

This comment has been minimized.

Sign in to view

shaypal5 and others added 2 commits May 12, 2026 18:29

Copilot AI review requested due to automatic review settings May 12, 2026 15:39

shaypal5 force-pushed the fix/tts-engine-use-generation-metadata branch from 8d3148d to a14755d Compare May 12, 2026 15:39

Copilot started reviewing on behalf of shaypal5 May 12, 2026 15:40 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

shaypal5 merged commit 1ea48f3 into main May 12, 2026
10 checks passed

shaypal5 deleted the fix/tts-engine-use-generation-metadata branch May 12, 2026 15:44

shaypal5 mentioned this pull request May 12, 2026

data(delivery-003): regen post-#107/#108/#109 — single_backend false positive resolved DataHackIL/avdp-synth-corpus#5

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend#112

fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend#112
shaypal5 merged 2 commits into
mainfrom
fix/tts-engine-use-generation-metadata

shaypal5 commented May 12, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

⚠️ Breaking changes

Solution

Test plan

Tier-3 ASR sanity (local)

Corpus impact

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 12, 2026 •

edited

Loading