Synopsis
Description
Thestop command unloads a running model from memory immediately, freeing up system resources (RAM/VRAM). This is useful when:
- You’re done with a model and want to free memory
- You want to load a different model
- You need to reduce system resource usage
- You’re switching tasks and don’t need the model loaded
stop command forces immediate unloading.
Arguments
Name of the model to stop. Must be currently running.Examples:
llama3.2mistral:7b-instructmyusername/custom-model
Options
Thestop command has no flags or options.
Examples
Stop a Running Model
Stop a model that’s currently loaded:Stop with Full Name
Include the tag explicitly:Stop Multiple Models
To stop multiple models, run the command for each:Check Before Stopping
See what’s running before stopping:Behavior
What Happens When You Stop
- Abort in-progress requests: Any active inference is cancelled
- Flush state: Model state and KV cache are discarded
- Unload from memory: Model weights are removed from RAM/VRAM
- Free resources: Memory becomes available for other models or applications
Keep-Alive Override
Stopping a model is equivalent to setting--keepalive 0 when running it:
Use Cases
Free Memory
Stop models to free up RAM/VRAM for other applications
Switch Models
Stop one model before loading another when memory is limited
End Session
Clean up after a long-running chat session
Reduce Power
Stop models to reduce GPU power consumption on laptops
Scripting Usage
Use in scripts and automation:Memory Recovery
Check memory before and after stopping:Automatic Unloading
By default, models unload automatically after the keep-alive timer expires:Environment Variables
Ollama server address
Default keep-alive time (server configuration)
Exit Codes
0- Success, model stopped1- Error occurred
Troubleshooting
Model Not Found
Server Not Running
Model Name Mismatch
If you created a model with a custom tag, use the full name:Graceful vs Forced Stop
Ollama performs a graceful stop:- In-progress requests are cancelled (not completed)
- State is saved if needed
- Resources are released cleanly
- No data corruption
When to Stop Models
✅ When to Use Stop
- Switching to a different model and memory is limited
- Done with a model for the day
- Need to free resources for other applications
- Model is set to
--keepalive -1(never unload) - Want to reload a model with different settings
❌ When NOT to Use Stop
- Between prompts in the same session (loses context)
- When the model will be used again soon (let auto-unload handle it)
- To interrupt a response (use Ctrl+C instead)
Performance Impact
Stopping and reloading a model has costs:| Model Size | Typical Load Time | GPU Memory |
|---|---|---|
| 3B params | 1-3 seconds | ~2 GB |
| 7B params | 3-8 seconds | ~4-5 GB |
| 13B params | 8-15 seconds | ~8-9 GB |
| 34B params | 20-40 seconds | ~20 GB |
| 70B params | 40-90 seconds | ~40 GB |
Related Commands
ollama ps- See which models are currently runningollama run- Run a model with custom keep-alive settingsollama serve- Configure default keep-alive time