Qwen3-TTS Documentation

A powerful Python SDK for text-to-speech generation with voice cloning, voice design, and ultra-low latency streaming supporting 10 major languages.

Get Started API Reference

Quick Start

Get up and running with Qwen3-TTS in minutes

Install the package

Install Qwen3-TTS using pip in a fresh Python environment:

pip install -U qwen-tts

We recommend using Python 3.12 with a clean conda or virtual environment to avoid dependency conflicts.

Load a model

Import and initialize a Qwen3-TTS model:

import torch
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

Generate speech

Generate natural-sounding speech from text:

import soundfile as sf

wavs, sr = model.generate_custom_voice(
    text="Hello, welcome to Qwen3-TTS!",
    language="English",
    speaker="Ryan",
)

sf.write("output.wav", wavs[0], sr)

The model automatically downloads from Hugging Face on first use. For offline environments, see the Installation guide.

Explore by Feature

Discover what you can build with Qwen3-TTS

Custom Voice

Generate speech with 9 premium preset voices covering multiple languages and dialects

Voice Design

Create custom voices from natural language descriptions with instruction-based control

Voice Cloning

Clone any voice in just 3 seconds from a reference audio sample

Streaming

Ultra-low latency streaming with 97ms end-to-end synthesis for real-time interactions

Batch Processing

Process multiple text inputs efficiently with batched inference

Fine-tuning

Customize models for your specific use case with fine-tuning

Key Features

What makes Qwen3-TTS powerful

10 Languages

Support for Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian with multilingual and cross-lingual capabilities.

Ultra-Low Latency

Achieve 97ms end-to-end synthesis latency with streaming generation, perfect for real-time interactive applications.

High Quality

Instruction Control

Control voice characteristics, emotion, and prosody using natural language instructions for expressive speech output.

Resources

Learn more about Qwen3-TTS

GitHub Repository

View source code, report issues, and contribute to the project

Visit repository

Research Paper

Read the technical paper on arXiv for architecture details and benchmarks

Read paper

Code Examples

Explore complete examples for common use cases and workflows

Browse examples

Benchmarks

Compare performance metrics across different models and datasets

View benchmarks

Ready to get started?

Install Qwen3-TTS and generate high-quality speech in minutes

Get Started Now

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Guides

Advanced

Qwen3-TTS Documentation

Quick Start

Explore by Feature

Custom Voice

Voice Design

Voice Cloning

Streaming

Batch Processing

Fine-tuning

Key Features

10 Languages

Ultra-Low Latency

High Quality

Instruction Control

Resources

GitHub Repository

Research Paper

Code Examples

Benchmarks

Ready to get started?

Build docs developers (and LLMs) love