Skip to main content

Environment Setup

The easiest way to use Qwen3-TTS is to install the qwen-tts Python package from PyPI. We recommend using a fresh, isolated environment to avoid dependency conflicts with existing packages.
Recommended: Python 3.12 for optimal compatibility and performance.Supported versions: Python 3.9, 3.10, 3.11, 3.12, 3.13 (see pyproject.toml:13-17)

Create a Clean Environment

Create a new conda environment with Python 3.12:
conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

Installation Methods

We strongly recommend installing FlashAttention 2 to reduce GPU memory usage and improve inference speed.
1

Standard Installation

For most systems:
pip install -U flash-attn --no-build-isolation
2

For Limited RAM Systems

If your machine has less than 96GB of RAM and many CPU cores:
MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
This limits the number of parallel compilation jobs to prevent out-of-memory errors during installation.
Hardware Requirements:
  • FlashAttention 2 requires compatible GPUs (see FlashAttention repository)
  • Can only be used when models are loaded in torch.float16 or torch.bfloat16

Model Download

During model loading, weights are automatically downloaded based on the model name. However, if your runtime environment doesn’t support automatic downloads, you can manually download models to a local directory.
Recommended for users in Mainland China:
# Install ModelScope CLI
pip install -U modelscope

# Download tokenizer
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz \
  --local_dir ./Qwen3-TTS-Tokenizer-12Hz

# Download TTS models
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --local_dir ./Qwen3-TTS-12Hz-1.7B-Base

modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice \
  --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice

modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base \
  --local_dir ./Qwen3-TTS-12Hz-0.6B-Base
When using locally downloaded models, pass the local directory path instead of the model name when loading:
model = Qwen3TTSModel.from_pretrained(
    "./Qwen3-TTS-12Hz-1.7B-CustomVoice",  # Local path
    device_map="cuda:0",
    dtype=torch.bfloat16,
)

System Requirements

Software

  • Python 3.9 or later (3.12 recommended)
  • PyTorch with CUDA support
  • 96GB+ RAM (for FlashAttention 2 compilation)

Hardware

  • NVIDIA GPU with CUDA support
  • Compatible with FlashAttention 2 (Ampere or newer recommended)
  • Sufficient VRAM for model size (varies by model)

Verify Installation

Test your installation by importing the package:
import torch
from qwen_tts import Qwen3TTSModel

print("Qwen3-TTS installed successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Next Steps

Quickstart Guide

Generate your first speech with complete working examples

Build docs developers (and LLMs) love