fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend#112
Merged
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR fixes incorrect backend labeling in clip metadata and QA reporting by removing the hardcoded per-clip tts_engine field and deriving backend diversity from the structured per-speaker generation_metadata.tts_backend map instead, eliminating false-positive single_backend warnings on mixed-provider runs.
Changes:
- Remove
ClipMetadata.tts_engineand stop writing the hardcoded"azure_he_IL"value during label generation. - Update QA aggregation + CLI output to use
generation_metadata.tts_backendand renameclips_by_tts_engine→clips_by_tts_backend. - Update docs and unit tests/fixtures to reflect the schema and QA-report changes.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
synthbanshee/labels/schema.py |
Drops tts_engine from the schema and updates ASCII-safe validation fields. |
synthbanshee/labels/generator.py |
Stops emitting the hardcoded tts_engine value when generating metadata. |
synthbanshee/package/qa.py |
Computes backend diversity/histogram from generation_metadata.tts_backend and renames the run-summary field accordingly. |
synthbanshee/cli.py |
Updates qa-report Rich table and JSON output to use clips_by_tts_backend. |
docs/spec.md |
Removes tts_engine from the example and documents the new source of truth. |
AGENTS.md |
Updates the list of string fields enforced by the ASCII-safe validator. |
tests/unit/test_qa.py |
Updates clip fixtures to always emit generation_metadata.tts_backend and adds run-summary backend-diversity tests. |
tests/unit/test_labels.py |
Adds coverage asserting tts_engine is absent and legacy JSON still parses. |
tests/unit/test_manifest.py |
Removes tts_engine from JSON fixtures. |
tests/unit/test_generation_metadata.py |
Removes tts_engine from minimal-clip fixture. |
tests/unit/test_cli.py |
Removes tts_engine from minimal-clip fixture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…nd from generation_metadata.tts_backend
ClipMetadata.tts_engine was always written as the literal string
"azure_he_IL" regardless of which TTS provider actually rendered the
clip. This mis-labeled every google-rendered clip and caused
qa-report's run-level `single_backend` warning to fire as a false
positive on mixed-provider runs (see delivery-003 in
avdp-synth-corpus#4, where 20 mixed-backend clips were tagged
`"clips_by_tts_engine": {"azure_he_IL": 20}` and triggered
`single_backend`).
Per the issue's recommended option (a): drop `tts_engine` as the
primary signal. The structured per-speaker `tts_backend` map already
lives in `generation_metadata` and carries richer information; the
flat field was dead weight.
Changes:
- synthbanshee/labels/schema.py: removed `tts_engine` field from
`ClipMetadata`; removed from the ascii-safe `field_validator`. Old
corpus JSON still parses cleanly (Pydantic ignores unknown fields).
- synthbanshee/labels/generator.py: dropped the hardcoded
`tts_engine="azure_he_IL"` line in `generate_clip_metadata`.
- synthbanshee/package/qa.py: backend diversity now derived from
`metadata.generation_metadata.tts_backend.values()`. `RunSummary`
field renamed `clips_by_tts_engine` → `clips_by_tts_backend` (a
clip with mixed providers contributes one count to each backend).
- synthbanshee/cli.py: qa-report renderer + JSON output updated for
the renamed field.
- docs/spec.md §5.1: removed `"tts_engine": "azure_he_IL"` from the
example JSON; added a field note pointing readers to
`generation_metadata.tts_backend`. Old corpus snapshots still
carrying the field should be tolerated and ignored.
- AGENTS.md (= CLAUDE.md): removed `tts_engine` from the ascii-safe
validator list.
- tests/unit/test_qa.py: helper `_write_valid_clip` replaced
`tts_engine` kwarg with `tts_backend` (per-speaker map); every
clip now emits `generation_metadata` so qa derivation has a source
of truth. Added two new tests under TestRunQARunSummary:
`single_backend` does NOT fire when two clips use different
backends; `single_backend` DOES fire when both clips are azure.
Existing `test_run_summary_backend_diversity` updated for the new
field name.
- tests/unit/test_labels.py: added `test_tts_engine_field_removed`
asserting the attribute is absent on freshly-built ClipMetadata
and that legacy JSON carrying the old field still parses cleanly.
- tests/unit/{test_cli,test_manifest,test_generation_metadata}.py:
dropped `tts_engine` from minimal-clip JSON fixtures.
Fixes #109.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, document break Self-review caught five gaps in the first cut. This commit addresses them: 1. Pre-#109 clips (no ``generation_metadata``) now bucket under the ``"unknown"`` backend key in ``RunSummary.clips_by_tts_backend`` instead of silently contributing zero. Previously, a corpus mixing pre- and post-#109 clips would lose the legacy clips from ``backend_count``, making ``single_backend`` flip the wrong way. 2. Updated ``RunSummary.clips_by_tts_backend`` docstring to: - flag the ``sum(values) >= total_clips`` invariant change (mixed- backend clips contribute +1 to each backend, so the pre-#109 sum-equals-total invariant no longer holds); - document the ``"unknown"`` bucket for missing generation_metadata. 3. ``test_tts_engine_field_removed`` now explicitly exercises ``with pytest.raises(AttributeError): m.tts_engine`` for both fresh models and ones parsed from legacy JSON. The previous version only asserted ``not hasattr(...)``, which would silently pass if Pydantic ever shipped ``extra="allow"`` semantics by accident. The new test makes the breaking-change contract explicit. 4. New ``test_pipeline_output_feeds_qa_backend_derivation`` integration test runs the full mocked ``_run_generate_pipeline`` end-to-end and then feeds the resulting clip directory to ``run_qa(run_summary=True)``, asserting ``"azure"`` lands in ``clips_by_tts_backend`` and ``"unknown"`` does NOT. Catches the regression where the generator's ``_backends_map`` shape diverges from qa.py's expectation — both unit-tested components could pass in isolation while the integration silently breaks. 5. New ``test_run_summary_clips_without_generation_metadata_bucket_as_unknown`` and ``test_run_summary_all_legacy_clips_fire_single_backend`` pin the unknown-bucket contract: legacy clips remain visible to diversity metrics, and an all-legacy corpus still correctly fires ``single_backend``. Full unit suite: 1702/1702 passing (up from 1699 — three new tests). ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8d3148d to
a14755d
Compare
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #112 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ClipMetadata.tts_enginewas always written as the literal string"azure_he_IL"regardless of which TTS provider actually rendered theclip. This mis-labeled every google-rendered clip and caused
qa-report's run-levelsingle_backendwarning to fire as a falsepositive on mixed-provider runs.
Fingerprint from
avdp-synth-corpus#4(delivery-003):And the matching
qa-report.json:single_backendis a false positive — the run genuinely has twobackends.
This PR is intentionally a schema and qa-output API break, scoped to
fix the bug correctly rather than perpetuate the dead-weight field. If
you have downstream consumers in
avdp-synth-corpusor elsewhere, grepthem for the patterns below before pulling this in.
ClipMetadata.tts_engine(Python attribute)str, always"azure_he_IL"AttributeError."tts_engine"fieldextra="ignore"); the value is silently dropped on read.RunSummary.clips_by_tts_engine(Python attribute)dict[str, int]keyed on"azure_he_IL"clips_by_tts_backend, keyed on"azure"/"google"/"unknown"."clips_by_tts_engine""clips_by_tts_backend"."azure_he_IL""azure","google","unknown"sum(clips_by_tts_backend.values())invariant== total_clips>= total_clips(a mixed-backend clip contributes +1 to each backend)The rename is deliberate (not scope creep): keeping the old name
clips_by_tts_enginewhile changing the value-space from"azure_he_IL"→
"azure"would have been a silent value-shape change. Renaming makesboth the value and key changes visible to downstream consumers in one
place. Anyone consuming the qa-report JSON will get a
KeyErrorratherthan wrong counts under the old key — which is what we want.
Solution
Per the issue's recommended option (a): drop
tts_engineas theprimary signal. The structured per-speaker
generation_metadata.tts_backendmap already lives in clip JSON and carries richer information (per-speaker,
not per-clip); the flat field was dead weight.
tts_enginefromClipMetadataentirely.tts_engine="azure_he_IL"line inLabelGenerator.generate_clip_metadata.qa.py, derive backend diversity fromgeneration_metadata.tts_backend.values()instead ofclip.tts_engine.RunSummary.clips_by_tts_engine→clips_by_tts_backend. Aclip with mixed providers contributes one count to each backend.
generation_metadata) under"unknown"so they remain visible to backend-diversity metrics. Without this, an
old corpus mixed with new clips would silently lose the old clips
from
backend_count.docs/spec.md§5.1 example + add a field note pointing readersto
generation_metadata.tts_backend.AGENTS.md(=CLAUDE.mdvia symlink) to droptts_enginefrom the ascii-safe validator list.
Test plan
.venv/bin/pytest tests/unit/ -q— 1702/1702 passingAttributeErrortest on both fresh and legacy-parsed models"unknown"bucket coverage for pre-bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA #109 clips (two new qa tests)_run_generate_pipeline→run_qa, asserts"azure"lands inclips_by_tts_backendruff check synthbanshee/ tests/unit/— cleanmypy synthbanshee/cli.py synthbanshee/package/qa.py synthbanshee/labels/schema.py synthbanshee/labels/generator.py— cleanTier-3 ASR sanity (local)
Not applicable — this PR touches
synthbanshee/labels/,synthbanshee/package/qa.py, the qa-report rendering insynthbanshee/cli.py, and docs. None ofsynthbanshee/tts/,synthbanshee/script/,synthbanshee/augment/, nor any speaker / scene/ acoustic / project YAML config is modified. Per CLAUDE.md "ASR sanity
check policy", this PR is exempt from the local
qa-report --asrrun.Corpus impact
Existing on-disk corpus clips (delivery-003 included) still have the
wrong
tts_enginevalue baked in. After this PR lands, the next regenwill simply drop the field. For a no-regen retrofit, a small follow-up
PR on
avdp-synth-corpuscan post-process the 21 affected JSONs (droptts_engine; derive any consumer logic fromgeneration_metadata.tts_backend). Otherwise, the new"unknown"bucket keeps them visible to qa-report diversity metrics, and
Pydantic's
extra="ignore"makes the stale field harmless on read.Fixes #109.
🤖 Generated with Claude Code