Skip to main content

Viewing Logs

One of the best ways to figure out what happened is to take a look at the logs.
Run the following command:
cat ~/.ollama/logs/server.log

Enabling Debug Logging

To enable additional debug logging:
1

Quit Ollama

First Quit the running app from the tray menu.
2

Enable debug mode

In a PowerShell terminal, run:
$env:OLLAMA_DEBUG="1"
& "ollama app.exe"
Join the Discord for help interpreting the logs.

LLM Libraries

Ollama includes multiple LLM libraries compiled for different GPUs and CPU vector features. Ollama tries to pick the best one based on the capabilities of your system. In the server log, you will see a message that looks something like this:
Dynamic LLM libraries [rocm_v6 cpu cpu_avx cpu_avx2 cuda_v11 rocm_v5]

Overriding Library Selection

If autodetection has problems, or you run into issues (e.g., crashes in your GPU), you can force a specific LLM library.
This is an experimental feature. Use with caution.
Performance ranking: cpu_avx2 (best) > cpu_avx > cpu (slowest but most compatible)
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
Rosetta emulation under macOS will work with the cpu library.

Checking CPU Features

You can see what features your CPU has:
cat /proc/cpuinfo | grep flags | head -1

Installing Older or Pre-release Versions on Linux

If you run into problems on Linux and want to install an older version, or you’d like to try out a pre-release before it’s officially released, you can tell the install script which version to install.
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh

Linux tmp noexec

If your system is configured with the “noexec” flag where Ollama stores its temporary executable files, you can specify an alternate location by setting OLLAMA_TMPDIR to a location writable by the user ollama runs as.
OLLAMA_TMPDIR=/usr/share/ollama/ ollama serve

Linux Docker

If Ollama initially works on the GPU in a docker container, but then switches to running on CPU after some period of time with errors in the server log reporting GPU discovery failures, this can be resolved by disabling systemd cgroup management in Docker. Edit /etc/docker/daemon.json on the host and add "exec-opts": ["native.cgroupdriver=cgroupfs"] to the docker configuration:
{
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

NVIDIA GPU Discovery

When Ollama starts up, it takes inventory of the GPUs present in the system to determine compatibility and how much VRAM is available. Sometimes this discovery can fail to find your GPUs.
In general, running the latest driver will yield the best results.

Linux NVIDIA Troubleshooting

1

Verify container runtime (if using Docker)

If you are using a container to run Ollama, make sure you’ve set up the container runtime first as described in the Docker documentation.Test the container runtime:
docker run --gpus all ubuntu nvidia-smi
If this doesn’t work, Ollama won’t be able to see your NVIDIA GPU.
2

Check if UVM driver is loaded

sudo nvidia-modprobe -u
3

Try reloading the nvidia_uvm driver

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
4

Try rebooting

Sometimes a simple reboot can resolve GPU discovery issues.
5

Update NVIDIA drivers

Make sure you’re running the latest NVIDIA drivers.

Common Error Codes

When you check the server logs, GPU initialization issues can show up as various error codes:
  • 3 - Not initialized
  • 46 - Device unavailable
  • 100 - No device
  • 999 - Unknown

Gathering Additional Information

If none of the above resolve the problem:
1

Enable detailed CUDA logging

Set CUDA_ERROR_LEVEL=50 and try again to get more diagnostic logs:
CUDA_ERROR_LEVEL=50 ollama serve
2

Check dmesg for errors

sudo dmesg | grep -i nvrm
sudo dmesg | grep -i nvidia
3

File an issue

Gather the logs and error messages and file an issue on the Ollama GitHub repository.

AMD GPU Discovery

Linux Permissions

On Linux, AMD GPU access typically requires video and/or render group membership to access the /dev/kfd device.
If permissions are not set up correctly, Ollama will detect this and report an error in the server log.

Container Access

When running in a container, in some Linux distributions and container runtimes, the ollama process may be unable to access the GPU. Use ls -lnd /dev/kfd /dev/dri /dev/dri/* on the host system to determine the numeric group IDs, and pass additional --group-add ... arguments to the container. Example output:
crw-rw---- 1 0  44 226,   0 Sep 16 16:55 /dev/dri/card0
In this case, the group ID is 44, so you would run:
docker run -d --group-add 44 -p 11434:11434 ollama/ollama

Troubleshooting AMD GPUs

The following environment variables can help isolate failures:
  • AMD_LOG_LEVEL=3 - Enable info log levels in the AMD HIP/ROCm libraries
  • OLLAMA_DEBUG=1 - Additional information during GPU discovery
Check dmesg for errors:
sudo dmesg | grep -i amdgpu
sudo dmesg | grep -i kfd

Multiple AMD GPUs

If you experience gibberish responses when models load across multiple AMD GPUs on Linux, see the ROCm documentation:

ROCm Multi-GPU Known Issues

AMD ROCm documentation on multi-GPU known issues and limitations

Windows Terminal Errors

Older versions of Windows 10 (e.g., 21H1) are known to have a bug where the standard terminal program does not display control characters correctly. This can result in a long string of characters like ←[?25h←[?25l being displayed, sometimes erroring with The parameter is incorrect.
To resolve this problem, please update to Windows 10 22H1 or newer.

Common Issues

Possible causes:
  • Insufficient memory (RAM or VRAM)
  • Corrupted model files
  • Incompatible GPU drivers
Solutions:
  • Check available memory with ollama ps
  • Try a smaller model
  • Re-download the model: ollama pull <model>
  • Update GPU drivers to the latest version
  • Check logs for specific error messages
Possible causes:
  • Model running on CPU instead of GPU
  • Insufficient context window
  • Too many concurrent requests
Solutions:
  • Verify GPU usage with ollama ps
  • Check the Processor column shows GPU usage
  • Adjust OLLAMA_NUM_PARALLEL to reduce concurrent requests
  • Consider using a quantized model for better performance
  • Enable Flash Attention: OLLAMA_FLASH_ATTENTION=1
Possible causes:
  • Ollama server not running
  • Firewall blocking connections
  • Incorrect host/port configuration
Solutions:
  • Start the server: ollama serve
  • Check if server is running: curl http://localhost:11434
  • Verify firewall settings allow port 11434
  • Check OLLAMA_HOST environment variable
Possible causes:
  • Network connectivity issues
  • Proxy configuration needed
  • Insufficient disk space
Solutions:
  • Check network connection
  • Configure proxy if needed (see FAQ)
  • Verify sufficient disk space in models directory
  • Try downloading again: ollama pull <model>
Possible causes:
  • Model too large for available memory
  • Too many models loaded simultaneously
  • Large context window
Solutions:
  • Use a smaller or more quantized model
  • Unload unused models: ollama stop <model>
  • Reduce context window size
  • Adjust OLLAMA_MAX_LOADED_MODELS
  • Lower OLLAMA_NUM_PARALLEL

Getting Help

If you’re still experiencing issues after trying the troubleshooting steps above:

Discord Community

Join our Discord server for community support and help interpreting logs

GitHub Issues

Report bugs and request features on GitHub

FAQ

Check the FAQ for answers to common questions

Documentation

Browse the full documentation for detailed guides

Build docs developers (and LLMs) love