Environment Setup
The easiest way to use Qwen3-TTS is to install the qwen-tts Python package from PyPI. We recommend using a fresh, isolated environment to avoid dependency conflicts with existing packages.
Recommended : Python 3.12 for optimal compatibility and performance.Supported versions: Python 3.9, 3.10, 3.11, 3.12, 3.13 (see pyproject.toml:13-17)
Create a Clean Environment
Create a new conda environment with Python 3.12:
conda create -n qwen3-tts python= 3.12 -y
conda activate qwen3-tts
Installation Methods
From PyPI (Recommended)
From Source
Install the latest stable release from PyPI: This will automatically install all required dependencies:
transformers==4.57.3
accelerate==1.12.0
gradio, librosa, torchaudio, soundfile, sox
onnxruntime, einops
Package information: qwen-tts v0.1.1 (see pyproject.toml:6-7)
Install from source if you want to modify the code or use the latest development version: git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
pip install -e .
The -e flag installs the package in editable mode, allowing you to modify the source code.
FlashAttention 2 (Recommended)
We strongly recommend installing FlashAttention 2 to reduce GPU memory usage and improve inference speed.
Standard Installation
For most systems: pip install -U flash-attn --no-build-isolation
For Limited RAM Systems
If your machine has less than 96GB of RAM and many CPU cores: MAX_JOBS = 4 pip install -U flash-attn --no-build-isolation
This limits the number of parallel compilation jobs to prevent out-of-memory errors during installation.
Hardware Requirements :
FlashAttention 2 requires compatible GPUs (see FlashAttention repository )
Can only be used when models are loaded in torch.float16 or torch.bfloat16
Model Download
During model loading, weights are automatically downloaded based on the model name. However, if your runtime environment doesn’t support automatic downloads, you can manually download models to a local directory.
ModelScope (China)
Hugging Face
Recommended for users in Mainland China: # Install ModelScope CLI
pip install -U modelscope
# Download tokenizer
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz \
--local_dir ./Qwen3-TTS-Tokenizer-12Hz
# Download TTS models
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base \
--local_dir ./Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \
--local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base \
--local_dir ./Qwen3-TTS-12Hz-0.6B-Base
Download from Hugging Face: # Install Hugging Face CLI
pip install -U "huggingface_hub[cli]"
# Download tokenizer
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz \
--local-dir ./Qwen3-TTS-Tokenizer-12Hz
# Download TTS models
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base \
--local-dir ./Qwen3-TTS-12Hz-1.7B-Base
huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \
--local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base \
--local-dir ./Qwen3-TTS-12Hz-0.6B-Base
When using locally downloaded models, pass the local directory path instead of the model name when loading: model = Qwen3TTSModel.from_pretrained(
"./Qwen3-TTS-12Hz-1.7B-CustomVoice" , # Local path
device_map = "cuda:0" ,
dtype = torch.bfloat16,
)
System Requirements
Software
Python 3.9 or later (3.12 recommended)
PyTorch with CUDA support
96GB+ RAM (for FlashAttention 2 compilation)
Hardware
NVIDIA GPU with CUDA support
Compatible with FlashAttention 2 (Ampere or newer recommended)
Sufficient VRAM for model size (varies by model)
Verify Installation
Test your installation by importing the package:
import torch
from qwen_tts import Qwen3TTSModel
print ( "Qwen3-TTS installed successfully!" )
print ( f "PyTorch version: { torch. __version__ } " )
print ( f "CUDA available: { torch.cuda.is_available() } " )
Next Steps
Quickstart Guide Generate your first speech with complete working examples