Installation
Convert PyTorch weights to JAX format before running inference.Convert weights
Quick start
Image-to-video generation
LTX-Video supports conditioning on input images for video animation.Configure conditioning
Add conditioning parameters toltx_video.yml:
conditioning_media_paths: List of image paths to condition onconditioning_start_frames: Frame indices for each conditioning imageconditioning_strengths: Influence strength (0.0-1.0) for each image
Run I2V inference
Parameters
| Parameter | Description | Default |
|---|---|---|
prompt | Text description of video content | Required |
height | Video height in pixels | 512 |
width | Video width in pixels | 768 |
num_frames | Number of frames to generate | 97 |
num_inference_steps | Denoising steps | 40 |
frame_rate | Output video FPS | 25 |
seed | Random seed for reproducibility | 0 |
conditioning_media_paths | List of conditioning image paths | None |
conditioning_start_frames | Frame indices for conditioning | [0] |
conditioning_strengths | Conditioning influence strengths | [1.0] |
Prompt enhancement
LTX-Video includes automatic prompt enhancement for short prompts.Configure enhancement
0 to disable enhancement:
Resolution and padding
LTX-Video automatically pads input dimensions to multiples of 32 for optimal processing.Automatic padding
The pipeline calculates padded dimensions (generate_ltx_video.py:178-181):
Multi-scale pipeline
LTX-Video supports multi-scale generation for higher quality outputs.Enable multi-scale
Output format
Videos are saved tooutputs/YYYY-MM-DD/ directory:
- Videos:
video_output_{i}_{prompt}_{H}x{W}x{F}_{index}.mp4 - Images (single frame):
image_output_{i}_{prompt}_{H}x{W}x{F}_{index}.png - Format: H.264 MP4 for videos, PNG for images
Implementation details
The LTX pipeline (generate_ltx_video.py:src/maxdiffusion/generate_ltx_video.py) implements:
Conditioning preparation
Prepare conditioning items from input images (generate_ltx_video.py:99-120):
Image preprocessing
Input images are preprocessed with cropping, resizing, and CRF compression (generate_ltx_video.py:50-96):
Video generation
The pipeline handles inference with optional conditioning (generate_ltx_video.py:208-220):
Post-processing
Remove padding and save output (generate_ltx_video.py:222-261):
Performance tips
- Use appropriate resolutions: Stick to multiples of 32 to avoid unnecessary padding
- Adjust frame count: Fewer frames = faster generation
- Enable prompt enhancement: For short prompts, enhancement improves quality
- Conditioning strength: Start with 1.0 and reduce if conditioning is too strong
Next steps
Wan video generation
Alternative video generation with Wan models
Flux inference
High-quality image generation
Configuration
Full configuration reference
Training overview
Fine-tune models on custom data