Quick Start Guide
This guide will help you run your first segmentation with SAM 3 in just a few minutes. We’ll cover both image and video segmentation with text prompts.Before starting, make sure you have installed SAM 3 and authenticated with Hugging Face to access the model checkpoints.
Image Segmentation
Let’s start with a simple image segmentation example using a text prompt.Enable GPU Optimizations
Enable TensorFloat-32 and automatic mixed precision for faster inference:
TF32 provides a good balance between performance and accuracy on Ampere and newer GPUs.
Load the Model
Load the SAM 3 image model and create a processor:The model will be automatically downloaded from Hugging Face on first use. This may take a few minutes depending on your internet connection.
Complete Image Example
Here’s the complete code for image segmentation:Video Segmentation
SAM 3 also supports video segmentation with temporal tracking.Complete Video Example
Adding Geometric Prompts
You can also use geometric prompts (boxes, points) in addition to or instead of text.Box Prompts
Combining Text and Geometric Prompts
Batch Processing
Process multiple images efficiently in batch:Configuration Options
Customize the processor for your use case:Tips for Best Results
Writing effective text prompts
Writing effective text prompts
- Be specific: “a person in a red jacket” works better than “person”
- Use descriptive attributes: colors, positions, actions
- For multiple objects: “all dogs in the image” or “dogs”
- Negative examples: prompts with no matches return empty results
Optimizing performance
Optimizing performance
- Use batch processing for multiple images
- Enable TF32 and mixed precision (bfloat16)
- Lower resolution for faster inference (trade-off with quality)
- Use
torch.inference_mode()ortorch.no_grad()contexts
Handling low confidence results
Handling low confidence results
- Adjust
confidence_thresholdin processor settings - Try more specific text prompts
- Combine text with geometric prompts for better accuracy
- Some objects may genuinely not be present in the image
Video segmentation tips
Video segmentation tips
- Start prompts on frames where objects are clearly visible
- Use multiple prompts across different frames for better tracking
- Video format can be MP4 or a directory of JPEG frames
- Session management allows processing multiple videos
Next Steps
Now that you’ve run your first segmentation, explore more advanced features:API Reference
Detailed API documentation for all SAM 3 components
Batched Inference
Process multiple images efficiently
Video Tracking
Deep dive into video segmentation and tracking
Interactive Refinement
Learn to refine segmentations with points and boxes