Overview
Ollama’s API uses standard HTTP status codes and structured error responses to communicate failures. All errors follow a consistent format that includes both the HTTP status code and a descriptive error message.
From api/types.go:22-41, errors are returned as StatusError objects:
type StatusError struct {
StatusCode int
Status string
ErrorMessage string `json:"error"`
}
JSON Response
{
"error" : "model 'llama3.2' not found"
}
When converted to a string, errors follow this pattern:
404 Not Found: model 'llama3.2' not found
HTTP Status Codes
400 Bad Request
Invalid request parameters or malformed input.
Missing request body: {
"error" : "missing request body"
}
Invalid JSON: {
"error" : "invalid character '}' looking for beginning of value"
}
Parameter validation: {
"error" : "top_logprobs must be between 0 and 20"
}
Model capability errors: {
"error" : "llama3.2 does not support generate"
}
Invalid options: {
"error" : "raw mode does not support template, system, or context"
}
401 Unauthorized
Authentication required or invalid credentials. From api/types.go:43-54:
type AuthorizationError struct {
StatusCode int
Status string
SigninURL string `json:"signin_url"`
}
Response:
{
"error" : "unauthorized" ,
"signin_url" : "https://ollama.com/connect?name=hostname&key=publickey"
}
Authorization is only required when connecting to remote Ollama instances or ollama.com.
403 Forbidden
Request understood but not allowed (e.g., cloud features disabled).
{
"error" : "remote model is unavailable"
}
404 Not Found
Requested resource doesn’t exist.
Model not found: {
"error" : "model 'llama3.2' not found"
}
Blob not found: {
"error" : "blob not found"
}
500 Internal Server Error
Server-side error during processing.
Template rendering: {
"error" : "template: undefined function \" invalid \" "
}
Model loading: {
"error" : "failed to load model: out of memory"
}
Tokenization: {
"error" : "failed to tokenize prompt"
}
Error Handling by Endpoint
Generate Endpoint
curl http://localhost:11434/api/generate -d '{
"model": "nonexistent",
"prompt": "Hello"
}'
Response (404):
{
"error" : "model 'nonexistent' not found"
}
Chat Endpoint
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
}'
No error - loads the model without generating.
Embed Endpoint
curl http://localhost:11434/api/embed -d '{
"model": "llama3.2",
"input": 123
}'
Response (400):
{
"error" : "invalid input type"
}
Client Error Handling
Go Client
From api/client.go:43-63, the client automatically wraps errors:
package main
import (
" context "
" errors "
" fmt "
" net/http "
" github.com/ollama/ollama/api "
)
func main () {
client , _ := api . ClientFromEnvironment ()
req := & api . GenerateRequest {
Model : "llama3.2" ,
Prompt : "Hello" ,
}
err := client . Generate ( context . Background (), req , func ( resp api . GenerateResponse ) error {
fmt . Print ( resp . Response )
return nil
})
if err != nil {
// Check for specific error types
var statusErr api . StatusError
if errors . As ( err , & statusErr ) {
switch statusErr . StatusCode {
case http . StatusNotFound :
fmt . Println ( "Model not found. Try: ollama pull llama3.2" )
case http . StatusBadRequest :
fmt . Println ( "Invalid request:" , statusErr . ErrorMessage )
case http . StatusInternalServerError :
fmt . Println ( "Server error:" , statusErr . ErrorMessage )
default :
fmt . Printf ( "Error %d : %s \n " , statusErr . StatusCode , statusErr . ErrorMessage )
}
return
}
var authErr api . AuthorizationError
if errors . As ( err , & authErr ) {
fmt . Println ( "Unauthorized. Sign in at:" , authErr . SigninURL )
return
}
// Generic error
fmt . Println ( "Error:" , err )
}
}
Python Client
import requests
import json
from typing import Optional
class OllamaError ( Exception ):
"""Base exception for Ollama errors"""
def __init__ ( self , status_code : int , message : str ):
self .status_code = status_code
self .message = message
super (). __init__ ( f " { status_code } : { message } " )
class ModelNotFoundError ( OllamaError ):
"""Model doesn't exist"""
pass
class BadRequestError ( OllamaError ):
"""Invalid request parameters"""
pass
class UnauthorizedError ( OllamaError ):
"""Authentication required"""
def __init__ ( self , status_code : int , message : str , signin_url : Optional[ str ] = None ):
super (). __init__ (status_code, message)
self .signin_url = signin_url
def generate ( model : str , prompt : str , ** kwargs ):
url = "http://localhost:11434/api/generate"
data = { "model" : model, "prompt" : prompt, ** kwargs}
try :
response = requests.post(url, json = data, stream = True )
# Check for HTTP errors
if response.status_code == 404 :
error_data = response.json()
raise ModelNotFoundError( 404 , error_data.get( 'error' , 'Model not found' ))
elif response.status_code == 400 :
error_data = response.json()
raise BadRequestError( 400 , error_data.get( 'error' , 'Bad request' ))
elif response.status_code == 401 :
error_data = response.json()
raise UnauthorizedError(
401 ,
error_data.get( 'error' , 'Unauthorized' ),
error_data.get( 'signin_url' )
)
elif response.status_code >= 400 :
error_data = response.json()
raise OllamaError(response.status_code, error_data.get( 'error' , 'Unknown error' ))
# Process streaming response
for line in response.iter_lines():
if line:
chunk = json.loads(line)
# Check for errors in stream
if 'error' in chunk:
raise OllamaError( 500 , chunk[ 'error' ])
yield chunk
except requests.RequestException as e:
raise OllamaError( 500 , f "Network error: { str (e) } " )
# Usage
try :
for chunk in generate( 'llama3.2' , 'Hello' , stream = True ):
print (chunk[ 'response' ], end = '' , flush = True )
except ModelNotFoundError as e:
print ( f " \n Model not found: { e.message } " )
print ( "Try: ollama pull llama3.2" )
except BadRequestError as e:
print ( f " \n Bad request: { e.message } " )
except UnauthorizedError as e:
print ( f " \n Unauthorized: { e.message } " )
if e.signin_url:
print ( f "Sign in at: { e.signin_url } " )
except OllamaError as e:
print ( f " \n Error { e.status_code } : { e.message } " )
JavaScript/TypeScript Client
interface OllamaError {
status : number ;
message : string ;
signin_url ?: string ;
}
class OllamaClient {
constructor ( private baseUrl : string = 'http://localhost:11434' ) {}
async generate ( model : string , prompt : string , options = {}) {
const response = await fetch ( ` ${ this . baseUrl } /api/generate` , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ({ model , prompt , ... options })
});
if ( ! response . ok ) {
const error : OllamaError = await response . json ();
switch ( response . status ) {
case 404 :
throw new Error ( `Model not found: ${ error . message } ` );
case 400 :
throw new Error ( `Bad request: ${ error . message } ` );
case 401 :
throw new Error ( `Unauthorized: ${ error . message }${ error . signin_url ? ` - Sign in at: ${ error . signin_url } ` : '' } ` );
case 500 :
throw new Error ( `Server error: ${ error . message } ` );
default :
throw new Error ( `Error ${ response . status } : ${ error . message } ` );
}
}
return response ;
}
}
// Usage
const client = new OllamaClient ();
try {
const response = await client . generate ( 'llama3.2' , 'Hello' );
const reader = response . body ?. getReader ();
// Process stream...
} catch ( error ) {
console . error ( 'Error:' , error . message );
}
Streaming Errors
Errors can occur during streaming after the connection is established. From api/client.go:217-260:
{ "model" : "llama3.2" , "response" : "The sky" , "done" : false }
{ "error" : "context length exceeded" }
When streaming, always check each chunk for the error field, not just the HTTP status code.
for line in response.iter_lines():
if line:
chunk = json.loads(line)
# Check for errors in the stream
if 'error' in chunk:
print ( f "Error during generation: { chunk[ 'error' ] } " )
break
print (chunk.get( 'response' , '' ), end = '' , flush = True )
Common Error Scenarios
Model Not Downloaded
Error:
{
"error" : "model 'llama3.2' not found"
}
Solution:
Context Length Exceeded
Error:
{
"error" : "prompt size exceeds context length"
}
Solutions:
Reduce prompt size
Enable truncation: "truncate": true
Increase context: "options": {"num_ctx": 8192}
Out of Memory
Error:
{
"error" : "failed to load model: out of memory"
}
Solutions:
Use a smaller model or quantized version
Reduce num_gpu layers
Close other applications
Increase system swap space
Invalid Model Capability
Error:
{
"error" : "llama3.2 does not support generate"
}
Solution:
Use the /api/chat endpoint instead for chat-tuned models.
Connection Refused
Error:
Solutions:
Start Ollama server: ollama serve
Check server address: http://localhost:11434
Verify firewall settings
Error Code Reference
Code Name Description Common Causes 400 Bad Request Invalid request format or parameters Missing fields, invalid JSON, parameter validation 401 Unauthorized Authentication required Missing/invalid credentials for remote models 403 Forbidden Request not allowed Cloud features disabled, insufficient permissions 404 Not Found Resource doesn’t exist Model not downloaded, invalid blob digest 500 Internal Server Error Server processing error Model loading failure, template errors, OOM
Debugging Tips
Enable verbose logging in Ollama server for detailed error context: OLLAMA_DEBUG = 1 ollama serve
Check Server Logs
On Linux/macOS:
On Windows:
Get-EventLog - LogName Application - Source Ollama - Newest 50
Test Connectivity
curl http://localhost:11434/api/version
Expected:
Validate Model Existence
curl http://localhost:11434/api/tags
Lists all available models.
Best Practices
Always check HTTP status codes before parsing response body
Handle streaming errors by checking each chunk for error field
Provide helpful error messages to users with actionable solutions
Implement retry logic for transient errors (with exponential backoff)
Log errors with context including request parameters for debugging
Validate inputs client-side before sending to reduce 400 errors
Pre-check model availability before making generation requests