Quickstart Guide

Get started with Ollama and run your first large language model locally in just a few minutes.

This guide assumes you have already installed Ollama. If not, see the Installation Guide first.

Your First Model

Run Ollama

The simplest way to get started is to run a model directly:

ollama run gemma3

This command will:

Download the Gemma 3 model (if not already downloaded)
Start the Ollama server (if not already running)
Begin an interactive chat session

On first run, the model will be downloaded. This may take a few minutes depending on your internet connection.

Chat with the Model

Once the model is loaded, you’ll see a prompt:

>>> Send a message (/? for help)

Try asking a question:

>>> Why is the sky blue?

The model will stream its response in real-time.

Type /help to see all available chat commands, or press Ctrl+D to exit.

Try Other Models

Explore different models from the library:

# Run Llama 3.2 (Meta's model)
ollama run llama3.2

# Run Mistral (efficient and fast)
ollama run mistral

# Run Phi-3 (Microsoft's compact model)
ollama run phi3

Browse all available models →

Using the REST API

Ollama provides a REST API server that runs on http://localhost:11434 by default.

Generate a Response

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

Chat with Context

Maintain conversation history:

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ],
  "stream": false
}'

Essential Commands

List Downloaded Models

ollama list

Output:

NAME              ID            SIZE      MODIFIED
gemma3:latest     a80c4f17acd5  2.0 GB    2 hours ago
llama3.2:latest   0a8c26691023  4.7 GB    1 day ago

Pull a Model

Download a model without running it:

ollama pull mistral

Remove a Model

Free up disk space:

ollama rm mistral

Check Running Models

See which models are currently loaded in memory:

ollama ps

Output:

NAME              ID            SIZE      PROCESSOR    CONTEXT    UNTIL
gemma3:latest     a80c4f17acd5  2.0 GB    100% GPU     4096       4 minutes from now

Multimodal Models

Some models can process images along with text:

ollama run llava "What's in this image? /path/to/image.jpg"

Or via the API:

curl http://localhost:11434/api/generate -d '{
  "model": "llava",
  "prompt": "What is in this image?",
  "images": ["base64_encoded_image_data"]
}'

Non-Interactive Mode

Run one-off prompts without entering chat mode:

# Single prompt
ollama run gemma3 "Explain quantum computing in simple terms"

# Pipe input
echo "Write a haiku about programming" | ollama run gemma3

# Save output
ollama run gemma3 "Generate a Python function" > output.py

Model Options

Customize model behavior with options:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Tell me a story",
  "options": {
    "temperature": 0.8,
    "top_p": 0.9,
    "seed": 42
  }
}'

Common Options

temperature (0.0-2.0): Controls randomness (default: 0.8)
top_p (0.0-1.0): Nucleus sampling threshold (default: 0.9)
seed: Set for reproducible outputs
num_predict: Maximum tokens to generate
stop: Custom stop sequences

Keep Alive

Control how long models stay in memory:

# Keep loaded for 10 minutes
ollama run gemma3 --keepalive 10m

# Unload immediately after use
ollama run gemma3 --keepalive 0

# Keep loaded indefinitely
ollama run gemma3 --keepalive -1

Default is 5 minutes after last use.

Server Management

Start the Server

The server starts automatically when you run a model, but you can start it manually:

ollama serve

The server runs on http://localhost:11434 by default.

Environment Variables

Common Environment Variables

OLLAMA_HOST: Change the bind address (default: 127.0.0.1:11434)
OLLAMA_MODELS: Custom model storage location
OLLAMA_NUM_PARALLEL: Number of parallel requests (default: 1)
OLLAMA_MAX_LOADED_MODELS: Max models in memory (default: 1)
OLLAMA_DEBUG: Enable debug logging

Example:

export OLLAMA_HOST=0.0.0.0:11434
ollama serve

Troubleshooting

Model download is slow

Model sizes range from 2GB to 70GB+. Use a wired connection for faster downloads. You can resume interrupted downloads by running the same command again.

Out of memory errors

Try a smaller quantized model:

Use gemma3:2b instead of gemma3:8b
Use Q4 quantization: llama3.2:3b-q4_0
Reduce context size with num_ctx option

Connection refused

The server may not be running:

ollama serve

Or check if another process is using port 11434.

GPU not detected

On Linux, ensure NVIDIA drivers are installed. On macOS, Metal should work automatically. Check with:

ollama run gemma3 --verbose

What’s Next?

Create Custom Models

Customize models with system prompts and parameters

API Reference

Complete REST API documentation

Import Models

Import models from PyTorch or Safetensors

CLI Reference

Complete command-line documentation

Join the Discord community for help and to share your projects!

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

Quickstart Guide

Quickstart Guide

Your First Model

Using the REST API

Generate a Response

Chat with Context

Essential Commands

List Downloaded Models

Pull a Model

Remove a Model

Check Running Models

Multimodal Models

Non-Interactive Mode

Model Options

Keep Alive

Server Management

Start the Server

Environment Variables

Troubleshooting

What’s Next?

Create Custom Models

API Reference

Import Models

CLI Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

​Quickstart Guide

​Your First Model

​Using the REST API

​Generate a Response

​Chat with Context

​Essential Commands

​List Downloaded Models

​Pull a Model

​Remove a Model

​Check Running Models

​Multimodal Models

​Non-Interactive Mode

​Model Options

​Keep Alive

​Server Management

​Start the Server

​Environment Variables

​Troubleshooting

​What’s Next?

Create Custom Models

API Reference

Import Models

CLI Reference

Build docs developers (and LLMs) love

Quickstart Guide

Your First Model

Using the REST API

Generate a Response

Chat with Context

Essential Commands

List Downloaded Models

Pull a Model

Remove a Model

Check Running Models

Multimodal Models

Non-Interactive Mode

Model Options

Keep Alive

Server Management

Start the Server

Environment Variables

Troubleshooting

What’s Next?