Skip to main content
OpenCLIP provides flexible model loading with support for pretrained weights, custom configurations, and multiple storage backends.

Basic Model Loading

create_model()

The core function for creating CLIP models with flexible configuration options.
import open_clip

model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)
model_name
str
required
Model architecture name (e.g., ‘ViT-B-32’, ‘RN50’) or schema-prefixed path:
  • Built-in: 'ViT-B-32'
  • HuggingFace Hub: 'hf-hub:org/repo'
  • Local directory: 'local-dir:/path/to/model'
pretrained
str
Pretrained weights source. Can be:
  • Tag name (e.g., ‘openai’, ‘laion2b_s34b_b79k’)
  • Local file path (e.g., ‘/path/to/weights.pt’)
  • Ignored if model_name uses schema prefix
device
str | torch.device
default:"cpu"
Device to load model on (‘cpu’, ‘cuda’, etc.)
precision
str
default:"fp32"
Model precision: ‘fp32’, ‘fp16’, ‘bf16’, ‘pure_fp16’, ‘pure_bf16’
jit
bool
default:"False"
Whether to JIT compile the model
force_image_size
int | Tuple[int, int]
Override default image size for the model
cache_dir
str
Directory for caching downloaded weights

Loading Schemas

HuggingFace Hub

Load models directly from HuggingFace Hub using the hf-hub: schema:
model = open_clip.create_model(
    'hf-hub:laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K',
    device='cuda'
)
The function automatically:
  • Downloads open_clip_config.json from the repo
  • Looks for weights files (.safetensors, .bin, .pth)
  • Merges preprocessing configuration

Local Directory

Load from a local directory containing model config and weights:
model = open_clip.create_model(
    'local-dir:/path/to/my/model',
    device='cuda'
)
Local directory must contain:
  • open_clip_config.json with model configuration
  • Weight file (searched in order): open_clip_model.safetensors, pytorch_model.bin, model.pth, etc.

Local File Path

Load weights from a specific file:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained='/path/to/checkpoint.pt',
    device='cuda'
)

Advanced Loading Options

Tower-Specific Weights

Load separate weights for image and text towers:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained_image=True,  # Load default ImageNet weights
    pretrained_text=True,   # Load default LM weights
    pretrained_image_path='/path/to/vision.pt',  # Override with custom weights
    pretrained_text_path='/path/to/text.pt'
)
pretrained_image
bool
default:"False"
Load default pretrained weights for image tower (timm models)
pretrained_text
bool
default:"True"
Load default pretrained weights for text tower (HuggingFace models)
pretrained_image_path
str
Path to custom image tower weights (loaded after full model)
pretrained_text_path
str
Path to custom text tower weights (loaded after full model)

Custom Model Configuration

Override model architecture parameters:
model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    force_quick_gelu=True,
    force_patch_dropout=0.5,
    force_image_size=336,
    force_context_length=128
)

create_model_and_transforms()

Convenience function that returns model with preprocessing transforms:
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)

# Use transforms
from PIL import Image
image = Image.open('example.jpg')
image_tensor = preprocess_val(image)
Returns a tuple of (model, train_transform, val_transform). The transforms handle:
  • Image resizing and cropping
  • Normalization with correct mean/std
  • Data augmentation (training only)
Always use model.eval() before inference. Models are in training mode by default, which affects layers like BatchNorm.

create_model_from_pretrained()

Strictly requires pretrained weights (raises error if weights can’t be loaded):
model, preprocess = open_clip.create_model_from_pretrained(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    return_transform=True
)
return_transform
bool
default:"True"
Whether to return preprocessing transform. If False, returns only model.
This is the recommended function for inference use cases where pretrained weights are essential.

Listing Available Models

import open_clip

# List all model architectures
architectures = open_clip.list_models()
print(architectures)  # ['RN50', 'RN101', 'ViT-B-32', 'ViT-L-14', ...]

# List all pretrained weights
pretrained = open_clip.list_pretrained()
for model_name, tag in pretrained:
    print(f"{model_name}:{tag}")

# List pretrained weights as strings
pretrained_str = open_clip.list_pretrained(as_str=True)
# ['RN50:openai', 'RN50:yfcc15m', 'ViT-B-32:laion2b_s34b_b79k', ...]

Weight Loading Options

load_weights
bool
default:"True"
Whether to load the resolved pretrained weights. Set to False for random initialization.
require_pretrained
bool
default:"False"
Raise error if pretrained weights cannot be loaded
weights_only
bool
default:"True"
Use weights_only=True for torch.load (safer, prevents arbitrary code execution)

Complete Example

import torch
import open_clip
from PIL import Image

# Load model with transforms
model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-L-14',
    pretrained='datacomp_xl_s13b_b90k',
    device='cuda',
    precision='fp16',
    force_image_size=224
)
model.eval()

# Get tokenizer
tokenizer = open_clip.get_tokenizer('ViT-L-14')

# Prepare inputs
image = preprocess(Image.open('cat.jpg')).unsqueeze(0).cuda()
text = tokenizer(["a cat", "a dog"]).cuda()

# Inference
with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    # Normalize features
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    
    # Compute similarity
    similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
    print("Similarity:", similarity)

Build docs developers (and LLMs) love