Qwen3-TTS Documentation
A powerful Python SDK for text-to-speech generation with voice cloning, voice design, and ultra-low latency streaming supporting 10 major languages.
Quick Start
Get up and running with Qwen3-TTS in minutes
Explore by Feature
Discover what you can build with Qwen3-TTS
Custom Voice
Voice Design
Voice Cloning
Streaming
Batch Processing
Fine-tuning
Key Features
What makes Qwen3-TTS powerful
10 Languages
Support for Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian with multilingual and cross-lingual capabilities.
Ultra-Low Latency
Achieve 97ms end-to-end synthesis latency with streaming generation, perfect for real-time interactive applications.
High Quality
Powered by Qwen3-TTS-Tokenizer-12Hz for efficient acoustic compression and high-fidelity speech reconstruction.
Instruction Control
Control voice characteristics, emotion, and prosody using natural language instructions for expressive speech output.
Resources
Learn more about Qwen3-TTS
GitHub Repository
View source code, report issues, and contribute to the project
Research Paper
Read the technical paper on arXiv for architecture details and benchmarks
Code Examples
Explore complete examples for common use cases and workflows
Benchmarks
Compare performance metrics across different models and datasets