Skip to main content

Synopsis

ollama stop MODEL

Description

The stop command unloads a running model from memory immediately, freeing up system resources (RAM/VRAM). This is useful when:
  • You’re done with a model and want to free memory
  • You want to load a different model
  • You need to reduce system resource usage
  • You’re switching tasks and don’t need the model loaded
Normally, models unload automatically after a keep-alive timeout (default 5 minutes). The stop command forces immediate unloading.

Arguments

MODEL
string
required
Name of the model to stop. Must be currently running.Examples:
  • llama3.2
  • mistral:7b-instruct
  • myusername/custom-model

Options

The stop command has no flags or options.

Examples

Stop a Running Model

Stop a model that’s currently loaded:
ollama stop llama3.2
No output on success. The model is immediately unloaded from memory.

Stop with Full Name

Include the tag explicitly:
ollama stop llama3.2:latest

Stop Multiple Models

To stop multiple models, run the command for each:
ollama stop llama3.2
ollama stop mistral:7b
ollama stop codellama

Check Before Stopping

See what’s running before stopping:
ollama ps
ollama stop llama3.2
ollama ps

Behavior

What Happens When You Stop

  1. Abort in-progress requests: Any active inference is cancelled
  2. Flush state: Model state and KV cache are discarded
  3. Unload from memory: Model weights are removed from RAM/VRAM
  4. Free resources: Memory becomes available for other models or applications

Keep-Alive Override

Stopping a model is equivalent to setting --keepalive 0 when running it:
# These have the same effect
ollama run llama3.2 --keepalive 0 "Hello"
ollama stop llama3.2

Use Cases

Free Memory

Stop models to free up RAM/VRAM for other applications

Switch Models

Stop one model before loading another when memory is limited

End Session

Clean up after a long-running chat session

Reduce Power

Stop models to reduce GPU power consumption on laptops

Scripting Usage

Use in scripts and automation:
# Stop all running models
for model in $(ollama ps | tail -n +2 | awk '{print $1}'); do
    ollama stop "$model"
done

# Stop model after a task
ollama run llama3.2 "Summarize this: ..."
ollama stop llama3.2

# Stop model with error handling
if ollama stop llama3.2 2>/dev/null; then
    echo "Model stopped successfully"
else
    echo "Model was not running or error occurred"
fi

Memory Recovery

Check memory before and after stopping:
# Check GPU memory
nvidia-smi

# Check what's running
ollama ps

# Stop a model
ollama stop llama3.2

# Verify it stopped
ollama ps

# Check GPU memory again
nvidia-smi

Automatic Unloading

By default, models unload automatically after the keep-alive timer expires:
# Default behavior (5 minutes)
ollama run llama3.2 "Hello"
# Wait 5 minutes...
# Model unloads automatically

# Keep loaded for 10 minutes
ollama run llama3.2 --keepalive 10m "Hello"

# Keep loaded indefinitely
ollama run llama3.2 --keepalive -1
# Must use 'ollama stop' to unload

# Unload immediately after response
ollama run llama3.2 --keepalive 0 "Hello"

Environment Variables

OLLAMA_HOST
string
default:"http://127.0.0.1:11434"
Ollama server address
OLLAMA_KEEP_ALIVE
duration
default:"5m"
Default keep-alive time (server configuration)

Exit Codes

  • 0 - Success, model stopped
  • 1 - Error occurred

Troubleshooting

Model Not Found

Error: couldn't find model "llama3.2" to stop
Solution: The model is not currently running. Check what’s running:
ollama ps
If you want to stop a model that exists but isn’t loaded, there’s nothing to do—it’s already not using resources.

Server Not Running

Error: could not connect to ollama server
Solution: Start the Ollama server:
ollama serve

Model Name Mismatch

If you created a model with a custom tag, use the full name:
# Wrong
ollama stop mymodel

# Correct
ollama stop mymodel:latest
# Or
ollama stop mymodel:v1

Graceful vs Forced Stop

Ollama performs a graceful stop:
  • In-progress requests are cancelled (not completed)
  • State is saved if needed
  • Resources are released cleanly
  • No data corruption
This is different from killing the server process, which is not recommended.

When to Stop Models

✅ When to Use Stop

  • Switching to a different model and memory is limited
  • Done with a model for the day
  • Need to free resources for other applications
  • Model is set to --keepalive -1 (never unload)
  • Want to reload a model with different settings

❌ When NOT to Use Stop

  • Between prompts in the same session (loses context)
  • When the model will be used again soon (let auto-unload handle it)
  • To interrupt a response (use Ctrl+C instead)

Performance Impact

Stopping and reloading a model has costs:
Model SizeTypical Load TimeGPU Memory
3B params1-3 seconds~2 GB
7B params3-8 seconds~4-5 GB
13B params8-15 seconds~8-9 GB
34B params20-40 seconds~20 GB
70B params40-90 seconds~40 GB
If you’ll use the model again within the keep-alive period, let it stay loaded.
  • ollama ps - See which models are currently running
  • ollama run - Run a model with custom keep-alive settings
  • ollama serve - Configure default keep-alive time

Build docs developers (and LLMs) love