Add Pocket TTS as a lightweight CPU-only TTS engine by Ahmed-Ezzat20 · Pull Request #7 · bakrianoo/mazinger

Ahmed-Ezzat20 · 2026-03-30T22:43:09Z

Summary

Adds Pocket TTS (Kyutai, 100M params, MIT license) as a third TTS engine alongside Qwen3-TTS and Chatterbox
CPU-only — no GPU required, ~6x real-time on modern CPUs, no CUDA/torch version conflicts
Supports zero-shot voice cloning from reference audio (requires gated model access)
Includes 8 predefined voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma
Gracefully falls back to predefined voice when cloning model is unavailable
English only (warns on non-English language, doesn't crash)
New optional dependency: pip install "mazinger[tts-pocket]"

Changes

File	Change
`tts.py`	New `_PocketTTSWrapper`, `_load_pocket_model`, `_create_pocket_voice_state`, `_synthesize_pocket`; updated `TTSEngine`, `load_model`, `create_voice_prompt`
`cli/_groups.py`	Added `"pocket"` to `--tts-engine` choices, new `--pocket-voice` arg
`cli/_dub.py`	Pass `pocket_voice` to pipeline
`pipeline.py`	`pocket_voice` param on `dub()`, override ref audio when set
`pyproject.toml`	New `tts-pocket` optional extra (`pocket-tts>=1.0`)

Usage

# With predefined voice (no special access needed)
mazinger dub video.mp4 --tts-engine pocket --pocket-voice alba

# With voice cloning (requires HF gated access)
mazinger dub video.mp4 --tts-engine pocket --voice-sample ./reference.wav

# CPU-only, no GPU needed
mazinger dub video.mp4 --tts-engine pocket --device cpu --pocket-voice marius

Test plan

Verified model loads and predefined voice synthesis works (alba voice, 4.64s output)
Verified graceful fallback when voice cloning model is unavailable
Verified CLI parser accepts --tts-engine pocket and --pocket-voice across all subcommands
Verified pipeline.py passes pocket_voice through to create_voice_prompt()
Test full dub pipeline end-to-end with Pocket TTS
Test with voice cloning model (requires HF gated access)

Adds Pocket TTS (Kyutai, 100M params) as a new TTS backend alongside Qwen3-TTS, Chatterbox, and MLX. Runs entirely on CPU with no GPU required. Supports zero-shot voice cloning (requires gated model access) and 8 predefined voices. Auto-trims long voice references (>30s) to prevent hallucination. Falls back to predefined voice when cloning unavailable. Currently English only. Usage: --tts-engine pocket [--pocket-voice alba]

Code enhancements: - Guard empty/whitespace text in _PocketTTSWrapper.synthesize() to return silence instead of crashing on Pocket's tokenizer. - Hoist _POCKET_MAX_REF_SECONDS to module level for visibility. - Mention 'pocket' in the --tts-engine help string (choices were already updated but the help text listed only qwen/chatterbox/mlx). Documentation: - README.md: add Pocket TTS to the pipeline feature list, install extras, and a dedicated "Dub without a GPU" Quick Start section. - docs/installation.md: add tts-pocket to the voice-synthesis install list, compatibility matrix, and task-requirements table. - docs/cli-reference.md: update both --tts-engine tables (dub and speak subcommands), add --pocket-voice flag, and a Pocket TTS example in the speak examples section. - docs/quick-start.md: add a "Use Pocket TTS" section with predefined- voice and voice-cloning examples.

Ahmed-Ezzat20 force-pushed the feat/pocket-tts-engine branch from daceb7f to ccfbce5 Compare April 9, 2026 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pocket TTS as a lightweight CPU-only TTS engine#7

Add Pocket TTS as a lightweight CPU-only TTS engine#7
Ahmed-Ezzat20 wants to merge 2 commits into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/pocket-tts-engine

Ahmed-Ezzat20 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ahmed-Ezzat20 commented Mar 30, 2026

Summary

Changes

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant