Error Handling

Overview

Ollama’s API uses standard HTTP status codes and structured error responses to communicate failures. All errors follow a consistent format that includes both the HTTP status code and a descriptive error message.

Error Response Format

From api/types.go:22-41, errors are returned as StatusError objects:

type StatusError struct {
    StatusCode   int
    Status       string
    ErrorMessage string `json:"error"`
}

JSON Response

{
  "error": "model 'llama3.2' not found"
}

Error String Format

When converted to a string, errors follow this pattern:

404 Not Found: model 'llama3.2' not found

HTTP Status Codes

400 Bad Request

Invalid request parameters or malformed input.

Common 400 Errors

Missing request body:

{
  "error": "missing request body"
}

Invalid JSON:

{
  "error": "invalid character '}' looking for beginning of value"
}

Parameter validation:

{
  "error": "top_logprobs must be between 0 and 20"
}

Model capability errors:

{
  "error": "llama3.2 does not support generate"
}

Invalid options:

{
  "error": "raw mode does not support template, system, or context"
}

401 Unauthorized

Authentication required or invalid credentials. From api/types.go:43-54:

type AuthorizationError struct {
    StatusCode int
    Status     string
    SigninURL  string `json:"signin_url"`
}

Response:

{
  "error": "unauthorized",
  "signin_url": "https://ollama.com/connect?name=hostname&key=publickey"
}

Authorization is only required when connecting to remote Ollama instances or ollama.com.

403 Forbidden

Request understood but not allowed (e.g., cloud features disabled).

{
  "error": "remote model is unavailable"
}

404 Not Found

Requested resource doesn’t exist.

Common 404 Errors

Model not found:

{
  "error": "model 'llama3.2' not found"
}

Blob not found:

{
  "error": "blob not found"
}

500 Internal Server Error

Server-side error during processing.

Common 500 Errors

Template rendering:

{
  "error": "template: undefined function \"invalid\""
}

Model loading:

{
  "error": "failed to load model: out of memory"
}

Tokenization:

{
  "error": "failed to tokenize prompt"
}

Error Handling by Endpoint

Generate Endpoint

curl http://localhost:11434/api/generate -d '{
  "model": "nonexistent",
  "prompt": "Hello"
}'

Response (404):

{
  "error": "model 'nonexistent' not found"
}

Chat Endpoint

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": []
}'

No error - loads the model without generating.

Embed Endpoint

curl http://localhost:11434/api/embed -d '{
  "model": "llama3.2",
  "input": 123
}'

Response (400):

{
  "error": "invalid input type"
}

Client Error Handling

Go Client

From api/client.go:43-63, the client automatically wraps errors:

package main

import (
	"context"
	"errors"
	"fmt"
	"net/http"
	"github.com/ollama/ollama/api"
)

func main() {
	client, _ := api.ClientFromEnvironment()
	
	req := &api.GenerateRequest{
		Model:  "llama3.2",
		Prompt: "Hello",
	}
	
	err := client.Generate(context.Background(), req, func(resp api.GenerateResponse) error {
		fmt.Print(resp.Response)
		return nil
	})
	
	if err != nil {
		// Check for specific error types
		var statusErr api.StatusError
		if errors.As(err, &statusErr) {
			switch statusErr.StatusCode {
			case http.StatusNotFound:
				fmt.Println("Model not found. Try: ollama pull llama3.2")
			case http.StatusBadRequest:
				fmt.Println("Invalid request:", statusErr.ErrorMessage)
			case http.StatusInternalServerError:
				fmt.Println("Server error:", statusErr.ErrorMessage)
			default:
				fmt.Printf("Error %d: %s\n", statusErr.StatusCode, statusErr.ErrorMessage)
			}
			return
		}
		
		var authErr api.AuthorizationError
		if errors.As(err, &authErr) {
			fmt.Println("Unauthorized. Sign in at:", authErr.SigninURL)
			return
		}
		
		// Generic error
		fmt.Println("Error:", err)
	}
}

Python Client

import requests
import json
from typing import Optional

class OllamaError(Exception):
    """Base exception for Ollama errors"""
    def __init__(self, status_code: int, message: str):
        self.status_code = status_code
        self.message = message
        super().__init__(f"{status_code}: {message}")

class ModelNotFoundError(OllamaError):
    """Model doesn't exist"""
    pass

class BadRequestError(OllamaError):
    """Invalid request parameters"""
    pass

class UnauthorizedError(OllamaError):
    """Authentication required"""
    def __init__(self, status_code: int, message: str, signin_url: Optional[str] = None):
        super().__init__(status_code, message)
        self.signin_url = signin_url

def generate(model: str, prompt: str, **kwargs):
    url = "http://localhost:11434/api/generate"
    data = {"model": model, "prompt": prompt, **kwargs}
    
    try:
        response = requests.post(url, json=data, stream=True)
        
        # Check for HTTP errors
        if response.status_code == 404:
            error_data = response.json()
            raise ModelNotFoundError(404, error_data.get('error', 'Model not found'))
        elif response.status_code == 400:
            error_data = response.json()
            raise BadRequestError(400, error_data.get('error', 'Bad request'))
        elif response.status_code == 401:
            error_data = response.json()
            raise UnauthorizedError(
                401,
                error_data.get('error', 'Unauthorized'),
                error_data.get('signin_url')
            )
        elif response.status_code >= 400:
            error_data = response.json()
            raise OllamaError(response.status_code, error_data.get('error', 'Unknown error'))
        
        # Process streaming response
        for line in response.iter_lines():
            if line:
                chunk = json.loads(line)
                
                # Check for errors in stream
                if 'error' in chunk:
                    raise OllamaError(500, chunk['error'])
                
                yield chunk
                
    except requests.RequestException as e:
        raise OllamaError(500, f"Network error: {str(e)}")

# Usage
try:
    for chunk in generate('llama3.2', 'Hello', stream=True):
        print(chunk['response'], end='', flush=True)
except ModelNotFoundError as e:
    print(f"\nModel not found: {e.message}")
    print("Try: ollama pull llama3.2")
except BadRequestError as e:
    print(f"\nBad request: {e.message}")
except UnauthorizedError as e:
    print(f"\nUnauthorized: {e.message}")
    if e.signin_url:
        print(f"Sign in at: {e.signin_url}")
except OllamaError as e:
    print(f"\nError {e.status_code}: {e.message}")

JavaScript/TypeScript Client

interface OllamaError {
  status: number;
  message: string;
  signin_url?: string;
}

class OllamaClient {
  constructor(private baseUrl: string = 'http://localhost:11434') {}

  async generate(model: string, prompt: string, options = {}) {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, prompt, ...options })
    });

    if (!response.ok) {
      const error: OllamaError = await response.json();
      
      switch (response.status) {
        case 404:
          throw new Error(`Model not found: ${error.message}`);
        case 400:
          throw new Error(`Bad request: ${error.message}`);
        case 401:
          throw new Error(`Unauthorized: ${error.message}${error.signin_url ? ` - Sign in at: ${error.signin_url}` : ''}`);
        case 500:
          throw new Error(`Server error: ${error.message}`);
        default:
          throw new Error(`Error ${response.status}: ${error.message}`);
      }
    }

    return response;
  }
}

// Usage
const client = new OllamaClient();

try {
  const response = await client.generate('llama3.2', 'Hello');
  const reader = response.body?.getReader();
  // Process stream...
} catch (error) {
  console.error('Error:', error.message);
}

Streaming Errors

Errors can occur during streaming after the connection is established. From api/client.go:217-260:

{"model":"llama3.2","response":"The sky","done":false}
{"error":"context length exceeded"}

When streaming, always check each chunk for the error field, not just the HTTP status code.

for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        
        # Check for errors in the stream
        if 'error' in chunk:
            print(f"Error during generation: {chunk['error']}")
            break
        
        print(chunk.get('response', ''), end='', flush=True)

Common Error Scenarios

Model Not Downloaded

Error:

{
  "error": "model 'llama3.2' not found"
}

Solution:

ollama pull llama3.2

Context Length Exceeded

Error:

{
  "error": "prompt size exceeds context length"
}

Solutions:

Reduce prompt size
Enable truncation: "truncate": true
Increase context: "options": {"num_ctx": 8192}

Out of Memory

Error:

{
  "error": "failed to load model: out of memory"
}

Solutions:

Use a smaller model or quantized version
Reduce num_gpu layers
Close other applications
Increase system swap space

Invalid Model Capability

Error:

{
  "error": "llama3.2 does not support generate"
}

Solution: Use the /api/chat endpoint instead for chat-tuned models.

Connection Refused

Error:

connection refused

Solutions:

Start Ollama server: ollama serve
Check server address: http://localhost:11434
Verify firewall settings

Error Code Reference

Code	Name	Description	Common Causes
400	Bad Request	Invalid request format or parameters	Missing fields, invalid JSON, parameter validation
401	Unauthorized	Authentication required	Missing/invalid credentials for remote models
403	Forbidden	Request not allowed	Cloud features disabled, insufficient permissions
404	Not Found	Resource doesn’t exist	Model not downloaded, invalid blob digest
500	Internal Server Error	Server processing error	Model loading failure, template errors, OOM

Debugging Tips

Enable verbose logging in Ollama server for detailed error context:

OLLAMA_DEBUG=1 ollama serve

Check Server Logs

On Linux/macOS:

journalctl -u ollama -f

On Windows:

Get-EventLog -LogName Application -Source Ollama -Newest 50

Test Connectivity

curl http://localhost:11434/api/version

Expected:

{
  "version": "0.5.1"
}

Validate Model Existence

curl http://localhost:11434/api/tags

Lists all available models.

Best Practices

Always check HTTP status codes before parsing response body
Handle streaming errors by checking each chunk for error field
Provide helpful error messages to users with actionable solutions
Implement retry logic for transient errors (with exponential backoff)
Log errors with context including request parameters for debugging
Validate inputs client-side before sending to reduce 400 errors
Pre-check model availability before making generation requests

Streaming Responses - Handle errors in streaming contexts
Usage Metrics - Monitor performance alongside error rates
API Reference - Full endpoint documentation

Getting Started

Concepts

Compatibility

Endpoints

Error Handling

Overview

Error Response Format

JSON Response

Error String Format

HTTP Status Codes

400 Bad Request

401 Unauthorized

403 Forbidden

404 Not Found

500 Internal Server Error

Error Handling by Endpoint

Generate Endpoint

Chat Endpoint

Embed Endpoint

Client Error Handling

Go Client

Python Client

JavaScript/TypeScript Client

Streaming Errors

Common Error Scenarios

Model Not Downloaded

Context Length Exceeded

Out of Memory

Invalid Model Capability

Connection Refused

Error Code Reference

Debugging Tips

Check Server Logs

Test Connectivity

Validate Model Existence

Best Practices

Build docs developers (and LLMs) love

Getting Started

Concepts

Compatibility

Endpoints

​Overview

​Error Response Format

​JSON Response

​Error String Format

​HTTP Status Codes

​400 Bad Request

​401 Unauthorized

​403 Forbidden

​404 Not Found

​500 Internal Server Error

​Error Handling by Endpoint

​Generate Endpoint

​Chat Endpoint

​Embed Endpoint

​Client Error Handling

​Go Client

​Python Client

​JavaScript/TypeScript Client

​Streaming Errors

​Common Error Scenarios

​Model Not Downloaded

​Context Length Exceeded

​Out of Memory

​Invalid Model Capability

​Connection Refused

​Error Code Reference

​Debugging Tips

​Check Server Logs

​Test Connectivity

​Validate Model Existence

​Best Practices

​Related Resources

Build docs developers (and LLMs) love

Overview

Error Response Format

JSON Response

Error String Format

HTTP Status Codes

400 Bad Request

401 Unauthorized

403 Forbidden

404 Not Found

500 Internal Server Error

Error Handling by Endpoint

Generate Endpoint

Chat Endpoint

Embed Endpoint

Client Error Handling

Go Client

Python Client

JavaScript/TypeScript Client

Streaming Errors

Common Error Scenarios

Model Not Downloaded

Context Length Exceeded

Out of Memory

Invalid Model Capability

Connection Refused

Error Code Reference

Debugging Tips

Check Server Logs

Test Connectivity

Validate Model Existence

Best Practices

Related Resources