Quickstart Guide
Get started with Ollama and run your first large language model locally in just a few minutes.This guide assumes you have already installed Ollama. If not, see the Installation Guide first.
Your First Model
Run Ollama
The simplest way to get started is to run a model directly:This command will:
- Download the Gemma 3 model (if not already downloaded)
- Start the Ollama server (if not already running)
- Begin an interactive chat session
On first run, the model will be downloaded. This may take a few minutes depending on your internet connection.
Chat with the Model
Once the model is loaded, you’ll see a prompt:Try asking a question:The model will stream its response in real-time.
Try Other Models
Using the REST API
Ollama provides a REST API server that runs onhttp://localhost:11434 by default.
Generate a Response
Chat with Context
Maintain conversation history:Essential Commands
List Downloaded Models
Pull a Model
Download a model without running it:Remove a Model
Free up disk space:Check Running Models
See which models are currently loaded in memory:Multimodal Models
Some models can process images along with text:Non-Interactive Mode
Run one-off prompts without entering chat mode:Model Options
Customize model behavior with options:Common Options
Common Options
temperature(0.0-2.0): Controls randomness (default: 0.8)top_p(0.0-1.0): Nucleus sampling threshold (default: 0.9)seed: Set for reproducible outputsnum_predict: Maximum tokens to generatestop: Custom stop sequences
Keep Alive
Control how long models stay in memory:Server Management
Start the Server
The server starts automatically when you run a model, but you can start it manually:http://localhost:11434 by default.
Environment Variables
Common Environment Variables
Common Environment Variables
OLLAMA_HOST: Change the bind address (default: 127.0.0.1:11434)OLLAMA_MODELS: Custom model storage locationOLLAMA_NUM_PARALLEL: Number of parallel requests (default: 1)OLLAMA_MAX_LOADED_MODELS: Max models in memory (default: 1)OLLAMA_DEBUG: Enable debug logging
Troubleshooting
Model download is slow
Model download is slow
Model sizes range from 2GB to 70GB+. Use a wired connection for faster downloads. You can resume interrupted downloads by running the same command again.
Out of memory errors
Out of memory errors
Try a smaller quantized model:
- Use
gemma3:2binstead ofgemma3:8b - Use Q4 quantization:
llama3.2:3b-q4_0 - Reduce context size with
num_ctxoption
Connection refused
Connection refused
The server may not be running:Or check if another process is using port 11434.
GPU not detected
GPU not detected
On Linux, ensure NVIDIA drivers are installed. On macOS, Metal should work automatically. Check with:
What’s Next?
Create Custom Models
Customize models with system prompts and parameters
API Reference
Complete REST API documentation
Import Models
Import models from PyTorch or Safetensors
CLI Reference
Complete command-line documentation