Installation

Environment Setup

The easiest way to use Qwen3-TTS is to install the qwen-tts Python package from PyPI. We recommend using a fresh, isolated environment to avoid dependency conflicts with existing packages.

Recommended: Python 3.12 for optimal compatibility and performance.Supported versions: Python 3.9, 3.10, 3.11, 3.12, 3.13 (see pyproject.toml:13-17)

Create a Clean Environment

Create a new conda environment with Python 3.12:

conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

Installation Methods

From PyPI (Recommended)
From Source

Install the latest stable release from PyPI:

pip install -U qwen-tts

This will automatically install all required dependencies:

transformers==4.57.3
accelerate==1.12.0
gradio, librosa, torchaudio, soundfile, sox
onnxruntime, einops

Package information: qwen-tts v0.1.1 (see pyproject.toml:6-7)

Install from source if you want to modify the code or use the latest development version:

git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
pip install -e .

The -e flag installs the package in editable mode, allowing you to modify the source code.

FlashAttention 2 (Recommended)

We strongly recommend installing FlashAttention 2 to reduce GPU memory usage and improve inference speed.

Standard Installation

For most systems:

pip install -U flash-attn --no-build-isolation

For Limited RAM Systems

If your machine has less than 96GB of RAM and many CPU cores:

MAX_JOBS=4 pip install -U flash-attn --no-build-isolation

This limits the number of parallel compilation jobs to prevent out-of-memory errors during installation.

Hardware Requirements:

FlashAttention 2 requires compatible GPUs (see FlashAttention repository)
Can only be used when models are loaded in torch.float16 or torch.bfloat16

Model Download

During model loading, weights are automatically downloaded based on the model name. However, if your runtime environment doesn’t support automatic downloads, you can manually download models to a local directory.

ModelScope (China)
Hugging Face

Recommended for users in Mainland China:

# Install ModelScope CLI
pip install -U modelscope

# Download tokenizer
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz \
  --local_dir ./Qwen3-TTS-Tokenizer-12Hz

# Download TTS models
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-Base

modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \
  --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice

modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base \
  --local_dir ./Qwen3-TTS-12Hz-0.6B-Base

Download from Hugging Face:

# Install Hugging Face CLI
pip install -U "huggingface_hub[cli]"

# Download tokenizer
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz \
  --local-dir ./Qwen3-TTS-Tokenizer-12Hz

# Download TTS models
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice

huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign

huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --local-dir ./Qwen3-TTS-12Hz-1.7B-Base

huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \
  --local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice

huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base \
  --local-dir ./Qwen3-TTS-12Hz-0.6B-Base

When using locally downloaded models, pass the local directory path instead of the model name when loading:

model = Qwen3TTSModel.from_pretrained(
    "./Qwen3-TTS-12Hz-1.7B-CustomVoice",  # Local path
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

System Requirements

Software

Python 3.9 or later (3.12 recommended)
PyTorch with CUDA support
96GB+ RAM (for FlashAttention 2 compilation)

Hardware

NVIDIA GPU with CUDA support
Compatible with FlashAttention 2 (Ampere or newer recommended)
Sufficient VRAM for model size (varies by model)

Verify Installation

Test your installation by importing the package:

import torch
from qwen_tts import Qwen3TTSModel

print("Qwen3-TTS installed successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Next Steps

Quickstart Guide

Generate your first speech with complete working examples

Get Started

Core Concepts

Guides

Advanced

Environment Setup

Create a Clean Environment

Installation Methods

FlashAttention 2 (Recommended)

Model Download

System Requirements

Software

Hardware

Verify Installation

Next Steps

Quickstart Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Environment Setup

​Create a Clean Environment

​Installation Methods

​FlashAttention 2 (Recommended)

​Model Download

​System Requirements

Software

Hardware

​Verify Installation

​Next Steps

Quickstart Guide

Build docs developers (and LLMs) love

Environment Setup

Create a Clean Environment

Installation Methods

FlashAttention 2 (Recommended)

Model Download

System Requirements

Verify Installation

Next Steps