Viewing Logs
One of the best ways to figure out what happened is to take a look at the logs.- macOS
- Linux
- Docker
- Windows
- Manual serve
Run the following command:
Enabling Debug Logging
- Windows
- macOS / Linux
Join the Discord for help interpreting the logs.
LLM Libraries
Ollama includes multiple LLM libraries compiled for different GPUs and CPU vector features. Ollama tries to pick the best one based on the capabilities of your system. In the server log, you will see a message that looks something like this:Overriding Library Selection
If autodetection has problems, or you run into issues (e.g., crashes in your GPU), you can force a specific LLM library. Performance ranking:cpu_avx2 (best) > cpu_avx > cpu (slowest but most compatible)
Rosetta emulation under macOS will work with the
cpu library.Checking CPU Features
You can see what features your CPU has:Installing Older or Pre-release Versions on Linux
If you run into problems on Linux and want to install an older version, or you’d like to try out a pre-release before it’s officially released, you can tell the install script which version to install.Linux tmp noexec
If your system is configured with the “noexec” flag where Ollama stores its temporary executable files, you can specify an alternate location by settingOLLAMA_TMPDIR to a location writable by the user ollama runs as.
Linux Docker
If Ollama initially works on the GPU in a docker container, but then switches to running on CPU after some period of time with errors in the server log reporting GPU discovery failures, this can be resolved by disabling systemd cgroup management in Docker. Edit/etc/docker/daemon.json on the host and add "exec-opts": ["native.cgroupdriver=cgroupfs"] to the docker configuration:
NVIDIA GPU Discovery
When Ollama starts up, it takes inventory of the GPUs present in the system to determine compatibility and how much VRAM is available. Sometimes this discovery can fail to find your GPUs.Linux NVIDIA Troubleshooting
Verify container runtime (if using Docker)
If you are using a container to run Ollama, make sure you’ve set up the container runtime first as described in the Docker documentation.Test the container runtime:
Common Error Codes
When you check the server logs, GPU initialization issues can show up as various error codes:- 3 - Not initialized
- 46 - Device unavailable
- 100 - No device
- 999 - Unknown
Gathering Additional Information
If none of the above resolve the problem:File an issue
Gather the logs and error messages and file an issue on the Ollama GitHub repository.
AMD GPU Discovery
Linux Permissions
On Linux, AMD GPU access typically requiresvideo and/or render group membership to access the /dev/kfd device.
Container Access
When running in a container, in some Linux distributions and container runtimes, the ollama process may be unable to access the GPU. Usels -lnd /dev/kfd /dev/dri /dev/dri/* on the host system to determine the numeric group IDs, and pass additional --group-add ... arguments to the container.
Example output:
44, so you would run:
Troubleshooting AMD GPUs
The following environment variables can help isolate failures:AMD_LOG_LEVEL=3- Enable info log levels in the AMD HIP/ROCm librariesOLLAMA_DEBUG=1- Additional information during GPU discovery
Multiple AMD GPUs
If you experience gibberish responses when models load across multiple AMD GPUs on Linux, see the ROCm documentation:ROCm Multi-GPU Known Issues
AMD ROCm documentation on multi-GPU known issues and limitations
Windows Terminal Errors
Older versions of Windows 10 (e.g., 21H1) are known to have a bug where the standard terminal program does not display control characters correctly. This can result in a long string of characters like←[?25h←[?25l being displayed, sometimes erroring with The parameter is incorrect.
Common Issues
Model fails to load or crashes
Model fails to load or crashes
Possible causes:
- Insufficient memory (RAM or VRAM)
- Corrupted model files
- Incompatible GPU drivers
- Check available memory with
ollama ps - Try a smaller model
- Re-download the model:
ollama pull <model> - Update GPU drivers to the latest version
- Check logs for specific error messages
Slow performance or inference
Slow performance or inference
Possible causes:
- Model running on CPU instead of GPU
- Insufficient context window
- Too many concurrent requests
- Verify GPU usage with
ollama ps - Check the
Processorcolumn shows GPU usage - Adjust
OLLAMA_NUM_PARALLELto reduce concurrent requests - Consider using a quantized model for better performance
- Enable Flash Attention:
OLLAMA_FLASH_ATTENTION=1
Connection refused or server not responding
Connection refused or server not responding
Possible causes:
- Ollama server not running
- Firewall blocking connections
- Incorrect host/port configuration
- Start the server:
ollama serve - Check if server is running:
curl http://localhost:11434 - Verify firewall settings allow port 11434
- Check
OLLAMA_HOSTenvironment variable
Model download fails or is very slow
Model download fails or is very slow
Possible causes:
- Network connectivity issues
- Proxy configuration needed
- Insufficient disk space
- Check network connection
- Configure proxy if needed (see FAQ)
- Verify sufficient disk space in models directory
- Try downloading again:
ollama pull <model>
Out of memory errors
Out of memory errors
Possible causes:
- Model too large for available memory
- Too many models loaded simultaneously
- Large context window
- Use a smaller or more quantized model
- Unload unused models:
ollama stop <model> - Reduce context window size
- Adjust
OLLAMA_MAX_LOADED_MODELS - Lower
OLLAMA_NUM_PARALLEL
Getting Help
If you’re still experiencing issues after trying the troubleshooting steps above:Discord Community
Join our Discord server for community support and help interpreting logs
GitHub Issues
Report bugs and request features on GitHub
FAQ
Check the FAQ for answers to common questions
Documentation
Browse the full documentation for detailed guides