Importing Models

Ollama supports importing models and adapters from various formats, allowing you to use custom fine-tuned models or convert models from other frameworks.

Supported Formats

Ollama can import:

Safetensors models - Full models in Safetensors format
Safetensors adapters - Fine-tuned LoRA adapters
GGUF files - Pre-quantized models and adapters from llama.cpp

Supported Architectures

Ollama supports importing models based on several architectures:

Llama

Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 4

Mistral

Mistral 1, Mistral 2, Mistral 3, and Mixtral

Gemma

Gemma 1, Gemma 2, and Gemma 3

Other

Phi3, Qwen2, Qwen3, Command R, and more

Importing Safetensors Models

Import full models from Safetensors weights into Ollama.

Create a Modelfile

Create a Modelfile pointing to your Safetensors directory:

FROM /path/to/safetensors/directory

If the Modelfile is in the same directory as the weights:

FROM .

Create the model

Run ollama create from the directory containing your Modelfile:

ollama create my-model

Test the model

Verify the model works:

ollama run my-model

This method works for foundation models and fine-tuned models that have been fused with a foundation model.

Importing Fine-Tuned Adapters

Import LoRA adapters created through fine-tuning frameworks.

Create a Modelfile with adapter

Specify the base model and adapter path:

FROM <base model name>
ADAPTER /path/to/safetensors/adapter/directory

If your adapter is in the same directory:

FROM llama3.2
ADAPTER .

Create the model

ollama create my-finetuned-model

Run the model

ollama run my-finetuned-model

Use the same base model in the FROM command that was used to create the adapter. Mismatched base models will produce erratic results.

Supported Fine-Tuning Frameworks

Create Safetensors adapters with these frameworks:

Use non-quantized (non-QLoRA) adapters for best results, as different frameworks use different quantization methods.

Importing GGUF Models

Import pre-quantized GGUF models or adapters from llama.cpp or Hugging Face.

GGUF Model

Create a Modelfile pointing to the GGUF file:

FROM /path/to/model.gguf

Then create the model:

ollama create my-gguf-model

GGUF Adapter

For GGUF adapters, specify both the base model and adapter:

FROM <model name>
ADAPTER /path/to/adapter.gguf

The base model can be:

An Ollama model (e.g., llama3.2)
A GGUF file
A Safetensors model

When importing GGUF adapters, ensure the base model matches the one used to create the adapter.

Converting to GGUF

You can convert models to GGUF format using llama.cpp tools:

Models: Use convert_hf_to_gguf.py from llama.cpp
Adapters: Use convert_lora_to_gguf.py from llama.cpp

Quantizing During Import

Quantize FP16 or FP32 models during import to reduce memory usage and increase performance.

Create a Modelfile with FP16/FP32 model

FROM /path/to/my/model/fp16

Create with quantization

Use the --quantize or -q flag:

ollama create --quantize q4_K_M mymodel

Supported Quantization Levels

q4_K_M

string

4-bit K-means quantization (medium) - Recommended balance of size and quality

q4_K_S

string

4-bit K-means quantization (small) - Smaller size, lower quality

q8_0

string

8-bit quantization - Higher quality, larger size

See the Model Quantization page for detailed information about quantization levels and performance trade-offs.

After creating your model, share it on ollama.com so others can use it.

Create an account

Visit ollama.com/signup to create an account. Your username will be part of your model’s name (e.g., username/modelname).

Add your public key

Go to Ollama Keys Settings and add your Ollama public key. The page will show you where to find your public key on your system.

Copy and push your model

Rename your model to include your username:

ollama cp mymodel myusername/mymodel
ollama push myusername/mymodel

Once pushed, others can use your model:

ollama run myusername/mymodel

Examples

Import a Hugging Face Model

# Download a model from Hugging Face
huggingface-cli download username/model-name --local-dir ./model

# Create Modelfile
echo "FROM ./model" > Modelfile

# Import to Ollama
ollama create hf-model

Import with Custom Parameters

FROM ./llama-model
TEMPLATE """{{- if .System }}<|system|>{{ .System }}<|end|>{{- end }}
{{- if .Prompt }}<|user|>{{ .Prompt }}<|end|>{{- end }}
<|assistant|>"""
PARAMETER temperature 0.8
PARAMETER top_p 0.9

Import a Fine-Tuned Adapter

FROM llama3.2
ADAPTER ./my-finetuned-adapter
SYSTEM "You are a helpful coding assistant specialized in Python."

Your imported model is ready to use with ollama run, the API, or any Ollama client library!

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

Importing Models

Supported Formats

Supported Architectures

Llama

Mistral

Gemma

Other

Importing Safetensors Models

Importing Fine-Tuned Adapters

Supported Fine-Tuning Frameworks

Importing GGUF Models

GGUF Model

GGUF Adapter

Converting to GGUF

Quantizing During Import

Supported Quantization Levels

Examples

Import a Hugging Face Model

Import with Custom Parameters

Import a Fine-Tuned Adapter

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

​Supported Formats

​Supported Architectures

Llama

Mistral

Gemma

Other

​Importing Safetensors Models

​Importing Fine-Tuned Adapters

​Supported Fine-Tuning Frameworks

​Importing GGUF Models

​GGUF Model

​GGUF Adapter

​Converting to GGUF

​Quantizing During Import

​Supported Quantization Levels

​Sharing Models

​Examples

​Import a Hugging Face Model

​Import with Custom Parameters

​Import a Fine-Tuned Adapter

Build docs developers (and LLMs) love

Supported Formats

Supported Architectures

Importing Safetensors Models

Importing Fine-Tuned Adapters

Supported Fine-Tuning Frameworks

Importing GGUF Models

GGUF Model

GGUF Adapter

Converting to GGUF

Quantizing During Import

Supported Quantization Levels

Sharing Models

Examples

Import a Hugging Face Model

Import with Custom Parameters

Import a Fine-Tuned Adapter