Skip to content

Latest commit

 

History

History
246 lines (166 loc) · 7.1 KB

File metadata and controls

246 lines (166 loc) · 7.1 KB

FlowTTS

Next-Generation Low-Latency Conversational Speech Synthesis

GitHub TRTC Python Node.js License PRs Welcome

English | 简体中文

Repository: https://github.com/Tencent-RTC/FlowTTS


FlowTTS: A next-generation low-latency speech synthesis system with voice cloning and human-like expression capabilities. It naturally presents filler words, emotions, and paralinguistic details, making AI in dialogue scenarios "sound like a real person".

Online Demo

Features

  • Ultra-Low Latency: Streaming SSE API with Keep-Alive connection
  • Voice Cloning: Create custom voice by submitting audio samples
  • Human-like Expression: Natural filler words, emotions, and paralinguistic details
  • Multi-language Support: Chinese/English/Japanese/Korean/Cantonese/Malay/Arabic/Indonesian/Thai/Vietnamese

Models

Model Use Case Features
flow_02_turbo Conversational (Latest) Ultra-low latency, high quality, supports Chinese/English/Japanese/Korean/Cantonese/Malay/Arabic/Indonesian/Thai/Vietnamese
flow_01_turbo Conversational Ultra-low latency, high quality, supports Chinese/English/Japanese/Korean/Cantonese/Malay/Arabic/Indonesian/Thai/Vietnamese

Recommended: Pass an empty string "" for the Model field to automatically use the latest model without specifying a version.

Voice List

Quick Start

1. Enable Service

FlowTTS is built on TRTC AI Conversation solution. You need to enable one of the following:

  • AI Recognition Package (Lite/Premium)
  • TRTC Monthly Plus Plan

See TRTC Activation & Billing

2. Install Dependencies

Python

cd examples/python
pip install -r requirements.txt

Note: Please ensure you install the latest version of Tencent Cloud SDK (>=3.0.1200) for full TTS feature support.

Node.js

cd examples/nodejs
npm install

Requires Node.js >= 18.

3. Configure Environment Variables

cp .env.example .env

Edit .env with your Tencent Cloud credentials:

TENCENTCLOUD_SECRET_ID=your_secret_id_here
TENCENTCLOUD_SECRET_KEY=your_secret_key_here
TENCENTCLOUD_SDK_APP_ID=1400000000

# Optional: custom API endpoint (defaults to trtc.ai.tencentcloudapi.com)
TENCENTCLOUD_ENDPOINT=trtc.ai.tencentcloudapi.com

Get credentials from Tencent Cloud Console

4. Run Examples

Python

# Streaming TTS
python examples/python/example_streaming.py

# Non-streaming TTS
python examples/python/example_non_streaming.py

# Voice cloning
python examples/python/example_voice_clone.py

# WebSocket bidirectional streaming
python examples/python/example_ws_bidirection.py

Node.js

cd examples/nodejs

# Streaming TTS
node example_streaming.js

# Non-streaming TTS
node example_non_streaming.js

# Voice cloning
node example_voice_clone.js

# WebSocket bidirectional streaming
node example_ws_bidirection.js

Voice Clone Example

# 1. Prepare audio sample (16kHz mono WAV, 10-180 seconds)
cp your_voice.wav test_data/clone_sample.wav

# 2. Clone voice and get voice_id
python examples/python/example_voice_clone.py

# 3. Use the returned voice_id in example_streaming.py for TTS
# Update VOICE_CONFIG["VoiceId"] with the cloned voice_id
python examples/python/example_streaming.py

Configuration

Voice Parameters

Parameter Range Description
Speed 0.5 ~ 2.0 Speech speed
Volume 0.01 ~ 10 Volume level (must be > 0)
Pitch -12 ~ 12 Pitch adjustment

Audio Format

API Type Formats Sample Rates
Streaming (SSE) pcm 16000, 24000
Non-streaming pcm, wav, mp3 16000, 24000

Default format: pcm, default sample rate: 24000

API Endpoint

Different APIs use different endpoints:

API Endpoint
Streaming SSE (TextToSpeechSSE) trtc.ai.tencentcloudapi.com
Non-streaming (TextToSpeech) trtc.tencentcloudapi.com
Voice Clone (VoiceClone) trtc.tencentcloudapi.com

Keep-Alive Connection

The SDK supports HTTP Keep-Alive to reuse TCP connections and reduce latency:

Python

http_profile = HttpProfile()
http_profile.keepAlive = True        # Enable Keep-Alive
http_profile.pre_conn_pool_size = 3  # Connection pool size
Parameter Description
keepAlive Reuses TCP connections, avoids repeated handshakes, reduces latency for subsequent requests
pre_conn_pool_size Pre-established connection pool size, connections are ready before first request

With Keep-Alive enabled, consecutive requests save approximately 50-100ms of connection establishment time

Node.js

Node.js HTTP agent supports connection reuse by default, no additional configuration needed.

API Documentation

In-repo Docs

TRTC AI Conversation Integration

Add TTS configuration in TRTC AI Conversation settings, TTSConfig:

{
  "TTSType": "flow",
  "VoiceId": "your_voice_id",
  "Model": "",
  "Speed": 1.0,
  "Volume": 1.0,
  "Pitch": 0,
  "Language": "zh"
}

Language Codes (ISO 639-1)

Language Code
Chinese zh
English en
Japanese ja
Korean ko
Cantonese yue
Malay ms
Arabic ar
Indonesian id
Thai th
Vietnamese vi

License

MIT License - see LICENSE for details.