Supported Formats
Ollama can import:- Safetensors models - Full models in Safetensors format
- Safetensors adapters - Fine-tuned LoRA adapters
- GGUF files - Pre-quantized models and adapters from llama.cpp
Supported Architectures
Ollama supports importing models based on several architectures:Llama
Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 4
Mistral
Mistral 1, Mistral 2, Mistral 3, and Mixtral
Gemma
Gemma 1, Gemma 2, and Gemma 3
Other
Phi3, Qwen2, Qwen3, Command R, and more
Importing Safetensors Models
Import full models from Safetensors weights into Ollama.Create a Modelfile
Create a If the Modelfile is in the same directory as the weights:
Modelfile pointing to your Safetensors directory:This method works for foundation models and fine-tuned models that have been fused with a foundation model.
Importing Fine-Tuned Adapters
Import LoRA adapters created through fine-tuning frameworks.Create a Modelfile with adapter
Specify the base model and adapter path:If your adapter is in the same directory:
Supported Fine-Tuning Frameworks
Create Safetensors adapters with these frameworks:Importing GGUF Models
Import pre-quantized GGUF models or adapters from llama.cpp or Hugging Face.GGUF Model
Create a Modelfile pointing to the GGUF file:GGUF Adapter
For GGUF adapters, specify both the base model and adapter:- An Ollama model (e.g.,
llama3.2) - A GGUF file
- A Safetensors model
When importing GGUF adapters, ensure the base model matches the one used to create the adapter.
Converting to GGUF
You can convert models to GGUF format using llama.cpp tools:- Models: Use
convert_hf_to_gguf.pyfrom llama.cpp - Adapters: Use
convert_lora_to_gguf.pyfrom llama.cpp
Quantizing During Import
Quantize FP16 or FP32 models during import to reduce memory usage and increase performance.Supported Quantization Levels
4-bit K-means quantization (medium) - Recommended balance of size and quality
4-bit K-means quantization (small) - Smaller size, lower quality
8-bit quantization - Higher quality, larger size
See the Model Quantization page for detailed information about quantization levels and performance trade-offs.
Sharing Models
After creating your model, share it on ollama.com so others can use it.Create an account
Visit ollama.com/signup to create an account. Your username will be part of your model’s name (e.g.,
username/modelname).Add your public key
Go to Ollama Keys Settings and add your Ollama public key. The page will show you where to find your public key on your system.
Examples
Import a Hugging Face Model
Import with Custom Parameters
Import a Fine-Tuned Adapter
Your imported model is ready to use with
ollama run, the API, or any Ollama client library!