GitHub - PositiveMatician/Echoes: Echoes: A 7-stage Google Colab pipeline to clone voices from raw call recordings into fully on-device Android TTS models. No cloud, no subscriptions, forever.

███████╗ ██████╗██╗  ██╗ ██████╗ ███████╗███████╗
██╔════╝██╔════╝██║  ██║██╔═══██╗██╔════╝██╔════╝
█████╗  ██║     ███████║██║   ██║█████╗  ███████╗
██╔══╝  ██║     ██╔══██║██║   ██║██╔══╝  ╚════██║
███████╗╚██████╗██║  ██║╚██████╔╝███████╗███████║
╚══════╝ ╚═════╝╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚══════╝

Leave your voice behind — for the people who love you

Raw call recordings → on-device Android TTS · No cloud · No subscription · Forever

Why this exists

I had a thought one day: if I were to die right now, my loved ones would never hear my voice again.

Not a voicemail. Not a shaky video clip. Nothing intentional — just silence.

So I set out to fix that. I wanted to clone my voice from something I already had — a call recording, thirty seconds of me talking — and turn it into a model my family could carry on their phone forever. No internet. No company that might shut down. No subscription that lapses. Just my voice, still there, whenever they need it.

I found OmniVoice (zero-shot cloning from a short clip), then Piper (a real TTS model), then Sherpa-ONNX (fully on-device Android inference), then a TTS engine app on F-Droid that ties it all together. The pieces existed — they just weren't connected, and most of the official notebooks were broken.

So I connected them. Fixed the bugs. And built this.

What Echoes is

A 7-notebook Google Colab pipeline that takes raw call recordings and produces a voice model that runs fully on-device on Android — no internet required after export.

All you need is ~30 seconds of someone's voice from any call recording.

Built on Piper VITS + Sherpa-ONNX. Every stage runs on the free Colab T4 GPU.

Pipeline

┌──────────────────────────────────────────────────────────────────────────────────┐
│                                                                                  │
│   [Raw Recordings — any call, any format]                                        │
│         │                                                                        │
│         ▼                                                                        │
│   ┌─────────────┐                                                                │
│   │  00 · Diarize│  pyannote.audio — speaker segmentation, resume-safe           │
│   └──────┬──────┘                                                                │
│          │  per-speaker WAV clips                                                │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  01 · Clone  │  OmniVoice zero-shot synthesis · faster-whisper transcription │
│   └──────┬──────┘                                                                │
│          │  LJSpeech dataset (wavs/ + metadata.csv)                             │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  02 · Train  │  Piper VITS fine-tune on hi_IN-rohan-medium                  │
│   └──────┬──────┘                                                                │
│          │  last.ckpt + config.json → Google Drive                               │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  03 · Check  │  Interactive PyTorch inference — validate before exporting    │
│   └──────┬──────┘                                                                │
│          │                                                                       │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  04 · Export │  .ckpt → .onnx  (all upstream bugs fixed — see below)        │
│   └──────┬──────┘                                                                │
│          │  voice-package.tar.gz                                                 │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  05 · Verify │  ONNX Runtime inference — standard + streaming models         │
│   └──────┬──────┘                                                                │
│          │                                                                       │
│          ▼                                                                       │
│   ┌─────────────┐                                                                │
│   │  06 · Sherpa │  Metadata injection · tokens.txt · sherpa_model.tar.gz       │
│   └──────┬──────┘                                                                │
│          │                                                                       │
│          ▼                                                                       │
│   [Android TTS — fully on-device, no internet, forever]                         │
│                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────┘

Notebooks

#	Notebook	What it does	Key libraries
0	Diarize & Clip	Speaker diarization on raw recordings. Segments audio per speaker. Resume-safe progress tracking.	`pyannote.audio 3.1`, `ffmpeg`
1	Voice Clone → Dataset	Zero-shot synthesis of 100+ sentences in the cloned voice. Auto-transcription. LJSpeech output format.	`OmniVoice`, `faster-whisper large-v3`
2	Dataset → Piper CKPT	Fine-tunes `hi_IN-rohan-medium` Piper base model. TensorBoard logging. Checkpoint export to Drive.	`Piper VITS`, `PyTorch Lightning`, `piper-phonemize-fix`
3	Check the CKPT	Interactive inference on the raw PyTorch checkpoint. Validates voice quality before ONNX export.	`Piper VITS`, `espeak-ng`, `ipywidgets`
4	Export to ONNX	Exports `.ckpt` → `.onnx`. Rewrites broken upstream export scripts with all critical fixes applied.	`torch.onnx` (opset 15), `onnxscript`
5	Check ONNX	Validates the exported model. Full encoder→decoder pipeline wired for streaming models.	`onnxruntime`, `piper-phonemize-fix`
6	Piper → Sherpa-ONNX	Injects Sherpa-ONNX metadata into `.onnx`. Generates `tokens.txt`. Packages for Android deployment.	`sherpa-onnx`, `onnx metadata API`

Bugs diagnosed & fixed

All of these were broken in the official Piper notebooks and diagnosed independently:

#	Bug	Root cause	Fix
1	`torch.onnx.export` crash	PyTorch 2.x defaults `dynamo=True`, which breaks `dynamic_axes` and `None` sid inputs	Explicitly pass `dynamo=False` to force the legacy TorchScript exporter
2	CPU/CUDA device mismatch	Model loads to CUDA by default; dummy inputs are created on CPU — tracing fails	`model_g = model_g.cpu()` before export
3	Opset version wrong for Sherpa	Official notebook used `opset_version=15` everywhere; Sherpa-ONNX requires opset 11	`opset_version=11` on the Sherpa export path
4	PyTorch 2.6 checkpoint loading	Default changed to `weights_only=True`, breaking Lightning's complex checkpoint objects	`weights_only=False, strict=False` in `VitsModel.load_from_checkpoint`
5	Colab dependency conflict	`piper-phonemize` has broken Colab deps	Replaced with `piper-phonemize-fix` throughout
6	Streaming inference not wired	Inference script detected encoder/decoder pairs but threw "not yet supported"	Full encoder → decoder inference pipeline implemented
7	Config file not detected	ONNX inference only matched `config.json`, missing Piper's `*.onnx.json` naming	Added `*.onnx.json` glob to `detect_onnx_models`

Google Drive setup

My Drive/
└── Voicecloning/
    ├── raw_calls/                 ← put your MP3 / WAV recordings here
    ├── clipped_audio/             ← Stage 0 output: per-speaker clips
    └── training/
        ├── colab/piper/           ← checkpoints and config.json
        └── piper-voice-packages/  ← exported .tar.gz voice packages

Prerequisites

HuggingFace token — required for pyannote/speaker-diarization-3.1 (Stage 0). Accept the model licence on HuggingFace, then paste your token into the notebook's Colab form field. No .env file needed.

Colab GPU runtime — Stages 1 and 2 need a T4 or better. The free tier works; training runs 6–12 hours. Stages 3–6 can run on CPU.

Google Drive (~2 GB free) — checkpoints, datasets, and exports save to Drive so they survive session restarts.

Android Deployment & Testing

The final sherpa_model.tar.gz can be used with any Sherpa-ONNX compatible Android application.

Recommended App for Testing:

TTS Engine (F-Droid) — A lightweight, open-source TTS engine that supports Sherpa-ONNX.

Verified Devices:

Redmi 9A
Redmi K20 Pro
Poco F1

Tech stack

Layer	Tools
Diarization	pyannote.audio 3.1, ffmpeg
Voice synthesis	OmniVoice (k2-fsa), faster-whisper large-v3
TTS architecture	VITS via Piper (rmcpantoja fork)
Training	PyTorch Lightning, piper-phonemize-fix, espeak-ng
Export	torch.onnx TorchScript path, opset 11/15, onnxscript
On-device inference	Sherpa-ONNX, onnxruntime
Platform	Google Colab (free T4 GPU)

Credits

rmcpantoja/piper — Piper training & inference notebooks (base, heavily modified)
rhasspy/piper — Piper TTS core
k2-fsa/OmniVoice — zero-shot voice synthesis
k2-fsa/sherpa-onnx — on-device ONNX inference

Amit Basuri · github.com/PositiveMatician · New Delhi

Built because voices shouldn't disappear. All upstream bug fixes (PyTorch 2.6 compat, ONNX export, Sherpa-ONNX conversion) diagnosed and implemented independently.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leave your voice behind — for the people who love you

Why this exists

What Echoes is

Pipeline

Notebooks

Bugs diagnosed & fixed

Google Drive setup

Prerequisites

Android Deployment & Testing

Tech stack

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Leave your voice behind — for the people who love you

Why this exists

What Echoes is

Pipeline

Notebooks

Bugs diagnosed & fixed

Google Drive setup

Prerequisites

Android Deployment & Testing

Tech stack

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages