Skip to main content

Overview

The Hugging Face integration enables CVAT users to leverage state-of-the-art computer vision models from the Hugging Face Hub for automatic annotation. This integration provides access to thousands of pre-trained models for object detection, segmentation, and other computer vision tasks.
The Hugging Face integration is available for CVAT Cloud users and Enterprise self-hosted installations. It is not available for community self-hosted deployments.

What is Hugging Face?

Hugging Face is the leading platform for machine learning models and datasets, offering:
  • Hugging Face Hub: Repository of 100,000+ models
  • Transformers Library: State-of-the-art NLP and CV models
  • Inference API: Easy model deployment
  • Model fine-tuning: Tools for training custom models

Prerequisites

  • CVAT Cloud account or Enterprise self-hosted installation
  • Hugging Face account (free or paid)
  • Hugging Face API token (for private models)
  • A CVAT task or project with labels defined

Supported Model Architectures

The integration supports various computer vision model architectures from Hugging Face:

Object Detection Models

  • DETR (DEtection TRansformer): Facebook’s transformer-based detector
  • YOLOv5/YOLOv8: Fast and accurate object detection
  • Faster R-CNN: Region-based convolutional neural networks
  • RetinaNet: Single-stage detector with focal loss

Segmentation Models

  • Mask R-CNN: Instance segmentation
  • Segment Anything (SAM): Universal segmentation model
  • Semantic Segmentation Models: SegFormer, UperNet, etc.

Transformers-Based Models

  • Vision Transformers (ViT): Image classification
  • CLIP: Vision-language models
  • DINOv2: Self-supervised vision features

Adding a Hugging Face Model

Follow these steps to integrate a Hugging Face model into CVAT:

Step 1: Find a Model on Hugging Face Hub

  1. Visit Hugging Face Model Hub
  2. Filter by task type:
    • Object Detection
    • Image Segmentation
    • Image Classification
  3. Select a model that matches your annotation needs
  4. Note the model ID (e.g., facebook/detr-resnet-50)

Step 2: Get Your API Token

  1. Go to Hugging Face Settings
  2. Click New token
  3. Give it a name and select permissions
  4. Copy the generated token
Keep your API token secure. Never share it publicly or commit it to version control.

Step 3: Add the Model in CVAT

  1. In CVAT, navigate to the Models page
  2. Click Add model
  3. Select Hugging Face as the model source
  4. Enter the following information:
    • Model name: Descriptive name for your reference
    • Model ID: The Hugging Face model identifier (e.g., facebook/detr-resnet-50)
    • API token: Your Hugging Face API token (for private models)
  5. Click Add to save the model

Using Hugging Face Models for Automatic Annotation

Once configured, you can use Hugging Face models for automatic annotation:

Running Automatic Annotation

  1. Open your task in CVAT
  2. Click Actions > Automatic annotation
  3. Select your Hugging Face model from the dropdown
  4. Configure settings:
    • Threshold: Confidence threshold (0.0-1.0)
    • Clean old annotations: Remove existing annotations
    • Return masks as polygons: Convert masks to polygons
  5. Map model labels to task labels
  6. Click Annotate
The annotation process will:
  1. Send images to Hugging Face Inference API
  2. Process predictions
  3. Create annotations in your task
  4. Show progress in real-time

Example: Using DETR for Object Detection

Here’s an example workflow using the DETR model:
# Example configuration for DETR model
model_config = {
    "name": "DETR Object Detection",
    "model_id": "facebook/detr-resnet-50",
    "task": "object-detection",
    "threshold": 0.7
}

# Label mapping
label_mapping = {
    "person": "person",
    "car": "vehicle",
    "truck": "vehicle",
    "bicycle": "bike",
    "motorcycle": "bike"
}

Using the CVAT Python SDK

from cvat_sdk import make_client

# Connect to CVAT
client = make_client(
    host="https://app.cvat.ai",
    credentials=("username", "password")
)

# Get task
task = client.tasks.retrieve(123)

# Run automatic annotation with Hugging Face model
task.annotate(
    model_name="detr-resnet-50",
    mapping=label_mapping,
    threshold=0.7,
    clear_existing=False
)

print(f"Annotation complete for task {task.id}")

Model Performance Optimization

Consider these factors when selecting a model:
  • Accuracy vs Speed: Larger models (ResNet-101) are more accurate but slower
  • Domain Similarity: Choose models trained on similar data
  • Label Coverage: Ensure the model supports your required labels
  • Model Size: Consider API latency for large models
Adjust the confidence threshold based on your needs:
  • High Precision (0.8-0.95): Fewer false positives, may miss objects
  • Balanced (0.5-0.7): Good trade-off between precision and recall
  • High Recall (0.3-0.5): Catch more objects, more false positives
Run test batches with different thresholds to find the optimal value.
For large annotation tasks:
  • Process in batches to avoid timeouts
  • Use lower resolution images if possible
  • Consider using multiple models for different object types
  • Monitor API rate limits

Advanced Configuration

Custom Model Parameters

Some models support additional parameters:
{
  "model_id": "facebook/detr-resnet-50",
  "parameters": {
    "threshold": 0.7,
    "max_detections": 100,
    "nms_threshold": 0.5
  }
}

Using Fine-Tuned Models

You can use your own fine-tuned models from Hugging Face:
  1. Train and upload your model to Hugging Face Hub
  2. Make the model public or use your API token
  3. Add the model to CVAT using your model ID
  4. Configure label mappings for your custom classes

Troubleshooting

Model Not Loading

Issue: Model fails to load in CVAT Solutions:
  • Verify the model ID is correct
  • Check that your API token has proper permissions
  • Ensure the model supports the required task type
  • Try using a different model version

Slow Inference

Issue: Automatic annotation is taking too long Solutions:
  • Use a smaller/faster model architecture
  • Reduce image resolution if possible
  • Process fewer images at a time
  • Check Hugging Face API status

Incorrect Predictions

Issue: Model predictions are inaccurate Solutions:
  • Adjust the confidence threshold
  • Try a model trained on more similar data
  • Consider fine-tuning the model on your data
  • Review and manually correct predictions

API Rate Limits

Hugging Face enforces API rate limits:
  • Free tier: Limited requests per hour
  • PRO tier: Higher limits and faster inference
  • Enterprise: Unlimited with dedicated infrastructure
If you hit rate limits:
  • Wait for the limit to reset
  • Upgrade your Hugging Face plan
  • Use batch processing with delays

Model Recommendations by Use Case

Use CaseRecommended ModelsNotes
General Object Detectionfacebook/detr-resnet-50Good balance of speed and accuracy
High-Accuracy Detectionfacebook/detr-resnet-101Slower but more accurate
Fast Detectionhustvl/yolos-tinyLower accuracy, very fast
Instance Segmentationfacebook/mask2former-swin-baseHigh-quality masks
Semantic Segmentationnvidia/segformer-b5-finetuned-adeDense pixel-level labeling
Face DetectionBingsu/RetinaFaceSpecialized for faces

Additional Resources

Build docs developers (and LLMs) love