Skip to content

fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend#112

Merged
shaypal5 merged 2 commits into
mainfrom
fix/tts-engine-use-generation-metadata
May 12, 2026
Merged

fix(labels,qa): #109 — drop hardcoded tts_engine, derive single_backend from generation_metadata.tts_backend#112
shaypal5 merged 2 commits into
mainfrom
fix/tts-engine-use-generation-metadata

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 12, 2026

Copy link
Copy Markdown
Member

Problem

ClipMetadata.tts_engine was always written as the literal string
"azure_he_IL" regardless of which TTS provider actually rendered the
clip. This mis-labeled every google-rendered clip and caused
qa-report's run-level single_backend warning to fire as a false
positive on mixed-provider runs.

Fingerprint from avdp-synth-corpus#4 (delivery-003):

$ jq '.tts_engine, .generation_metadata.tts_backend' \
    data/he/agg_m_30-45_002/sp_sv_a_0003_00.json
# → "azure_he_IL"                                          ← WRONG (this clip is Google)
# → {"AGG_M_30-45_002": "google", "VIC_F_25-40_003": "google"}  ← correct, per-speaker

And the matching qa-report.json:

"clips_by_tts_engine": {"azure_he_IL": 20},
"run_warnings": ["single_backend", ...]

single_backend is a false positive — the run genuinely has two
backends.

⚠️ Breaking changes

This PR is intentionally a schema and qa-output API break, scoped to
fix the bug correctly rather than perpetuate the dead-weight field. If
you have downstream consumers in avdp-synth-corpus or elsewhere, grep
them for the patterns below before pulling this in.

Surface Before After
ClipMetadata.tts_engine (Python attribute) str, always "azure_he_IL" Removed. Access raises AttributeError.
Clip JSON "tts_engine" field always present Not written. Legacy JSON parses fine (Pydantic extra="ignore"); the value is silently dropped on read.
RunSummary.clips_by_tts_engine (Python attribute) dict[str, int] keyed on "azure_he_IL" Renamed to clips_by_tts_backend, keyed on "azure" / "google" / "unknown".
qa-report JSON key "clips_by_tts_engine" always present Renamed to "clips_by_tts_backend".
Backend histogram value-space "azure_he_IL" "azure", "google", "unknown"
sum(clips_by_tts_backend.values()) invariant == total_clips >= total_clips (a mixed-backend clip contributes +1 to each backend)

The rename is deliberate (not scope creep): keeping the old name
clips_by_tts_engine while changing the value-space from "azure_he_IL"
"azure" would have been a silent value-shape change. Renaming makes
both the value and key changes visible to downstream consumers in one
place. Anyone consuming the qa-report JSON will get a KeyError rather
than wrong counts under the old key — which is what we want.

Solution

Per the issue's recommended option (a): drop tts_engine as the
primary signal. The structured per-speaker generation_metadata.tts_backend
map already lives in clip JSON and carries richer information (per-speaker,
not per-clip); the flat field was dead weight.

  • Remove tts_engine from ClipMetadata entirely.
  • Drop the hardcoded tts_engine="azure_he_IL" line in
    LabelGenerator.generate_clip_metadata.
  • In qa.py, derive backend diversity from
    generation_metadata.tts_backend.values() instead of clip.tts_engine.
  • Rename RunSummary.clips_by_tts_engineclips_by_tts_backend. A
    clip with mixed providers contributes one count to each backend.
  • Bucket pre-bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA #109 clips (no generation_metadata) under "unknown"
    so they remain visible to backend-diversity metrics. Without this, an
    old corpus mixed with new clips would silently lose the old clips
    from backend_count.
  • Update docs/spec.md §5.1 example + add a field note pointing readers
    to generation_metadata.tts_backend.
  • Update AGENTS.md (= CLAUDE.md via symlink) to drop tts_engine
    from the ascii-safe validator list.

Test plan

  • .venv/bin/pytest tests/unit/ -q1702/1702 passing
  • Explicit AttributeError test on both fresh and legacy-parsed models
  • "unknown" bucket coverage for pre-bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA #109 clips (two new qa tests)
  • End-to-end test: full mocked _run_generate_pipelinerun_qa, asserts "azure" lands in clips_by_tts_backend
  • ruff check synthbanshee/ tests/unit/ — clean
  • mypy synthbanshee/cli.py synthbanshee/package/qa.py synthbanshee/labels/schema.py synthbanshee/labels/generator.py — clean
  • Pre-commit hooks (ruff, ruff-format, mypy) — clean at commit time

Tier-3 ASR sanity (local)

Not applicable — this PR touches synthbanshee/labels/,
synthbanshee/package/qa.py, the qa-report rendering in
synthbanshee/cli.py, and docs. None of synthbanshee/tts/,
synthbanshee/script/, synthbanshee/augment/, nor any speaker / scene
/ acoustic / project YAML config is modified. Per CLAUDE.md "ASR sanity
check policy", this PR is exempt from the local qa-report --asr run.

Corpus impact

Existing on-disk corpus clips (delivery-003 included) still have the
wrong tts_engine value baked in. After this PR lands, the next regen
will simply drop the field. For a no-regen retrofit, a small follow-up
PR on avdp-synth-corpus can post-process the 21 affected JSONs (drop
tts_engine; derive any consumer logic from
generation_metadata.tts_backend). Otherwise, the new "unknown"
bucket keeps them visible to qa-report diversity metrics, and
Pydantic's extra="ignore" makes the stale field harmless on read.

Fixes #109.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 12, 2026 06:56
@shaypal5 shaypal5 added type: fix Bug fix comp: labels Label generation, schema, IAA utilities labels May 12, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect backend labeling in clip metadata and QA reporting by removing the hardcoded per-clip tts_engine field and deriving backend diversity from the structured per-speaker generation_metadata.tts_backend map instead, eliminating false-positive single_backend warnings on mixed-provider runs.

Changes:

  • Remove ClipMetadata.tts_engine and stop writing the hardcoded "azure_he_IL" value during label generation.
  • Update QA aggregation + CLI output to use generation_metadata.tts_backend and rename clips_by_tts_engineclips_by_tts_backend.
  • Update docs and unit tests/fixtures to reflect the schema and QA-report changes.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
synthbanshee/labels/schema.py Drops tts_engine from the schema and updates ASCII-safe validation fields.
synthbanshee/labels/generator.py Stops emitting the hardcoded tts_engine value when generating metadata.
synthbanshee/package/qa.py Computes backend diversity/histogram from generation_metadata.tts_backend and renames the run-summary field accordingly.
synthbanshee/cli.py Updates qa-report Rich table and JSON output to use clips_by_tts_backend.
docs/spec.md Removes tts_engine from the example and documents the new source of truth.
AGENTS.md Updates the list of string fields enforced by the ASCII-safe validator.
tests/unit/test_qa.py Updates clip fixtures to always emit generation_metadata.tts_backend and adds run-summary backend-diversity tests.
tests/unit/test_labels.py Adds coverage asserting tts_engine is absent and legacy JSON still parses.
tests/unit/test_manifest.py Removes tts_engine from JSON fixtures.
tests/unit/test_generation_metadata.py Removes tts_engine from minimal-clip fixture.
tests/unit/test_cli.py Removes tts_engine from minimal-clip fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synthbanshee/package/qa.py Outdated
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

shaypal5 and others added 2 commits May 12, 2026 18:29
…nd from generation_metadata.tts_backend

ClipMetadata.tts_engine was always written as the literal string
"azure_he_IL" regardless of which TTS provider actually rendered the
clip. This mis-labeled every google-rendered clip and caused
qa-report's run-level `single_backend` warning to fire as a false
positive on mixed-provider runs (see delivery-003 in
avdp-synth-corpus#4, where 20 mixed-backend clips were tagged
`"clips_by_tts_engine": {"azure_he_IL": 20}` and triggered
`single_backend`).

Per the issue's recommended option (a): drop `tts_engine` as the
primary signal. The structured per-speaker `tts_backend` map already
lives in `generation_metadata` and carries richer information; the
flat field was dead weight.

Changes:
- synthbanshee/labels/schema.py: removed `tts_engine` field from
  `ClipMetadata`; removed from the ascii-safe `field_validator`. Old
  corpus JSON still parses cleanly (Pydantic ignores unknown fields).
- synthbanshee/labels/generator.py: dropped the hardcoded
  `tts_engine="azure_he_IL"` line in `generate_clip_metadata`.
- synthbanshee/package/qa.py: backend diversity now derived from
  `metadata.generation_metadata.tts_backend.values()`. `RunSummary`
  field renamed `clips_by_tts_engine` → `clips_by_tts_backend` (a
  clip with mixed providers contributes one count to each backend).
- synthbanshee/cli.py: qa-report renderer + JSON output updated for
  the renamed field.
- docs/spec.md §5.1: removed `"tts_engine": "azure_he_IL"` from the
  example JSON; added a field note pointing readers to
  `generation_metadata.tts_backend`. Old corpus snapshots still
  carrying the field should be tolerated and ignored.
- AGENTS.md (= CLAUDE.md): removed `tts_engine` from the ascii-safe
  validator list.
- tests/unit/test_qa.py: helper `_write_valid_clip` replaced
  `tts_engine` kwarg with `tts_backend` (per-speaker map); every
  clip now emits `generation_metadata` so qa derivation has a source
  of truth. Added two new tests under TestRunQARunSummary:
  `single_backend` does NOT fire when two clips use different
  backends; `single_backend` DOES fire when both clips are azure.
  Existing `test_run_summary_backend_diversity` updated for the new
  field name.
- tests/unit/test_labels.py: added `test_tts_engine_field_removed`
  asserting the attribute is absent on freshly-built ClipMetadata
  and that legacy JSON carrying the old field still parses cleanly.
- tests/unit/{test_cli,test_manifest,test_generation_metadata}.py:
  dropped `tts_engine` from minimal-clip JSON fixtures.

Fixes #109.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, document break

Self-review caught five gaps in the first cut. This commit addresses them:

1. Pre-#109 clips (no ``generation_metadata``) now bucket under the
   ``"unknown"`` backend key in ``RunSummary.clips_by_tts_backend``
   instead of silently contributing zero. Previously, a corpus mixing
   pre- and post-#109 clips would lose the legacy clips from
   ``backend_count``, making ``single_backend`` flip the wrong way.

2. Updated ``RunSummary.clips_by_tts_backend`` docstring to:
   - flag the ``sum(values) >= total_clips`` invariant change (mixed-
     backend clips contribute +1 to each backend, so the pre-#109
     sum-equals-total invariant no longer holds);
   - document the ``"unknown"`` bucket for missing generation_metadata.

3. ``test_tts_engine_field_removed`` now explicitly exercises
   ``with pytest.raises(AttributeError): m.tts_engine`` for both fresh
   models and ones parsed from legacy JSON. The previous version only
   asserted ``not hasattr(...)``, which would silently pass if
   Pydantic ever shipped ``extra="allow"`` semantics by accident. The
   new test makes the breaking-change contract explicit.

4. New ``test_pipeline_output_feeds_qa_backend_derivation`` integration
   test runs the full mocked ``_run_generate_pipeline`` end-to-end and
   then feeds the resulting clip directory to ``run_qa(run_summary=True)``,
   asserting ``"azure"`` lands in ``clips_by_tts_backend`` and
   ``"unknown"`` does NOT. Catches the regression where the generator's
   ``_backends_map`` shape diverges from qa.py's expectation — both
   unit-tested components could pass in isolation while the
   integration silently breaks.

5. New ``test_run_summary_clips_without_generation_metadata_bucket_as_unknown``
   and ``test_run_summary_all_legacy_clips_fire_single_backend`` pin
   the unknown-bucket contract: legacy clips remain visible to
   diversity metrics, and an all-legacy corpus still correctly fires
   ``single_backend``.

Full unit suite: 1702/1702 passing (up from 1699 — three new tests).
ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 15:39
@shaypal5 shaypal5 force-pushed the fix/tts-engine-use-generation-metadata branch from 8d3148d to a14755d Compare May 12, 2026 15:39
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #112 in repository https://github.com/DataHackIL/SynthBanshee. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25745193064 attempt 1
Comment timestamp: 2026-05-12T15:41:36.349620+00:00
PR head commit: a14755dbe920e9a5f81686b2516ca6959133e053

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

@shaypal5 shaypal5 merged commit 1ea48f3 into main May 12, 2026
10 checks passed
@shaypal5 shaypal5 deleted the fix/tts-engine-use-generation-metadata branch May 12, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: labels Label generation, schema, IAA utilities type: fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(labels,qa): tts_engine hardcoded "azure_he_IL" regardless of actual provider — misleads single_backend QA

2 participants