Base URL
After installation, Ollama’s API is served by default at:You can customize the host using the
OLLAMA_HOST environment variable. The default is http://localhost:11434.Quick Start Example
Once Ollama is running, you can immediately start making API requests. Here’s a simple example usingcurl:
Key Features
Chat Completions
Multi-turn conversations with context preservation and streaming support
Text Generation
Generate text from prompts with full control over model parameters
Embeddings
Create vector embeddings for semantic search and RAG applications
Model Management
Pull, create, copy, and delete models programmatically
Core Endpoints
The Ollama API provides the following main endpoints:Response Formats
The API supports both streaming and non-streaming responses:- Streaming (default): Responses are sent as newline-delimited JSON (NDJSON) for real-time output
- Non-streaming: Set
"stream": falsein your request for a single complete response
Streaming responses use the
application/x-ndjson content type, while non-streaming responses use application/json.Official Client Libraries
Ollama provides official client libraries that make integration even easier:Python
pip install ollamaJavaScript
npm install ollamaAPI Versioning
Ollama’s API isn’t strictly versioned but is designed to be stable and backwards compatible. Any deprecations are rare and will be announced in the release notes. You can check your Ollama version using:Next Steps
Authentication
Learn how to authenticate with ollama.com for cloud models
Client Libraries
Use official Python and JavaScript libraries
API Reference
Explore all available endpoints and parameters
Streaming
Implement real-time streaming responses