From 6df055a174600d6e9c4047684b29e69c5d6aacbf Mon Sep 17 00:00:00 2001 From: Robert Genito Date: Tue, 16 Jun 2026 21:24:21 -0600 Subject: [PATCH] feat(helpers): add sync_audio.py for precise multicam audio sync via GCC-PHAT Multicam projects need frame-accurate audio alignment. Eyeballing offsets from Scribe's (claps) audio_event timestamps drifts 200-500ms, and plain cross-correlation between mismatched mics gives a broad, ambiguous peak. GCC-PHAT whitens the cross-power spectrum before correlating, so the peak stays sharp regardless of mic frequency response or room reverb. It is sub-frame accurate even between a studio mic and an on-camera mic. The helper takes a reference video, one or more targets, and rough sync timestamps for at least one shared transient (a clap). It cross-correlates 10s+ windows around each event, combines the lag with the window-start delta into a precise source-time offset, averages across events per target, and flags disagreement over 50ms as likely clap-pattern aliasing or clock drift. Output goes to sync_offsets.json, the helper is documented in SKILL.md, and scipy is declared as a direct dependency. More about me: https://geni.to/about Signed-off-by: Robert Genito --- SKILL.md | 1 + helpers/sync_audio.py | 295 ++++++++++++++++++++++++++++++++++++++++++ pyproject.toml | 1 + 3 files changed, 297 insertions(+) create mode 100644 helpers/sync_audio.py diff --git a/SKILL.md b/SKILL.md index 63eb84c..eed9710 100644 --- a/SKILL.md +++ b/SKILL.md @@ -73,6 +73,7 @@ Helpers (`helpers/transcribe.py`, `helpers/render.py`, etc.) live alongside this - **`transcribe_batch.py `** — 4-worker parallel transcription. Use for multi-take. - **`pack_transcripts.py --edit-dir `** — `transcripts/*.json` → `takes_packed.md` (phrase-level, break on silence ≥ 0.5s). - **`timeline_view.py