Skip to content

Add Pocket TTS as a lightweight CPU-only TTS engine#7

Open
Ahmed-Ezzat20 wants to merge 2 commits into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/pocket-tts-engine
Open

Add Pocket TTS as a lightweight CPU-only TTS engine#7
Ahmed-Ezzat20 wants to merge 2 commits into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/pocket-tts-engine

Conversation

@Ahmed-Ezzat20

Copy link
Copy Markdown
Contributor

Summary

  • Adds Pocket TTS (Kyutai, 100M params, MIT license) as a third TTS engine alongside Qwen3-TTS and Chatterbox
  • CPU-only — no GPU required, ~6x real-time on modern CPUs, no CUDA/torch version conflicts
  • Supports zero-shot voice cloning from reference audio (requires gated model access)
  • Includes 8 predefined voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma
  • Gracefully falls back to predefined voice when cloning model is unavailable
  • English only (warns on non-English language, doesn't crash)
  • New optional dependency: pip install "mazinger[tts-pocket]"

Changes

File Change
tts.py New _PocketTTSWrapper, _load_pocket_model, _create_pocket_voice_state, _synthesize_pocket; updated TTSEngine, load_model, create_voice_prompt
cli/_groups.py Added "pocket" to --tts-engine choices, new --pocket-voice arg
cli/_dub.py Pass pocket_voice to pipeline
pipeline.py pocket_voice param on dub(), override ref audio when set
pyproject.toml New tts-pocket optional extra (pocket-tts>=1.0)

Usage

# With predefined voice (no special access needed)
mazinger dub video.mp4 --tts-engine pocket --pocket-voice alba

# With voice cloning (requires HF gated access)
mazinger dub video.mp4 --tts-engine pocket --voice-sample ./reference.wav

# CPU-only, no GPU needed
mazinger dub video.mp4 --tts-engine pocket --device cpu --pocket-voice marius

Test plan

  • Verified model loads and predefined voice synthesis works (alba voice, 4.64s output)
  • Verified graceful fallback when voice cloning model is unavailable
  • Verified CLI parser accepts --tts-engine pocket and --pocket-voice across all subcommands
  • Verified pipeline.py passes pocket_voice through to create_voice_prompt()
  • Test full dub pipeline end-to-end with Pocket TTS
  • Test with voice cloning model (requires HF gated access)

Adds Pocket TTS (Kyutai, 100M params) as a new TTS backend alongside
Qwen3-TTS, Chatterbox, and MLX. Runs entirely on CPU with no GPU required.

Supports zero-shot voice cloning (requires gated model access) and
8 predefined voices. Auto-trims long voice references (>30s) to prevent
hallucination. Falls back to predefined voice when cloning unavailable.
Currently English only.

Usage: --tts-engine pocket [--pocket-voice alba]
@Ahmed-Ezzat20 Ahmed-Ezzat20 force-pushed the feat/pocket-tts-engine branch from daceb7f to ccfbce5 Compare April 9, 2026 00:17
Code enhancements:
- Guard empty/whitespace text in _PocketTTSWrapper.synthesize() to
  return silence instead of crashing on Pocket's tokenizer.
- Hoist _POCKET_MAX_REF_SECONDS to module level for visibility.
- Mention 'pocket' in the --tts-engine help string (choices were
  already updated but the help text listed only qwen/chatterbox/mlx).

Documentation:
- README.md: add Pocket TTS to the pipeline feature list, install
  extras, and a dedicated "Dub without a GPU" Quick Start section.
- docs/installation.md: add tts-pocket to the voice-synthesis install
  list, compatibility matrix, and task-requirements table.
- docs/cli-reference.md: update both --tts-engine tables (dub and
  speak subcommands), add --pocket-voice flag, and a Pocket TTS
  example in the speak examples section.
- docs/quick-start.md: add a "Use Pocket TTS" section with predefined-
  voice and voice-cloning examples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant