Skip to main content
The Ollama API provides a comprehensive REST interface for running and managing large language models locally. Whether you’re building chatbots, generating text, creating embeddings, or managing models, the Ollama API gives you full control over your AI infrastructure.

Base URL

After installation, Ollama’s API is served by default at:
http://localhost:11434/api
For running cloud models on ollama.com, the same API is available with the following base URL:
https://ollama.com/api
You can customize the host using the OLLAMA_HOST environment variable. The default is http://localhost:11434.

Quick Start Example

Once Ollama is running, you can immediately start making API requests. Here’s a simple example using curl:
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Key Features

Chat Completions

Multi-turn conversations with context preservation and streaming support

Text Generation

Generate text from prompts with full control over model parameters

Embeddings

Create vector embeddings for semantic search and RAG applications

Model Management

Pull, create, copy, and delete models programmatically

Core Endpoints

The Ollama API provides the following main endpoints:
1

Generate

Generate text completions from a prompt
POST /api/generate
2

Chat

Create multi-turn conversations with message history
POST /api/chat
3

Embeddings

Generate embeddings for text input
POST /api/embed
POST /api/embeddings
4

Models

List, pull, create, copy, and delete models
GET /api/tags
POST /api/pull
POST /api/create
POST /api/copy
DELETE /api/delete

Response Formats

The API supports both streaming and non-streaming responses:
  • Streaming (default): Responses are sent as newline-delimited JSON (NDJSON) for real-time output
  • Non-streaming: Set "stream": false in your request for a single complete response
Streaming responses use the application/x-ndjson content type, while non-streaming responses use application/json.

Official Client Libraries

Ollama provides official client libraries that make integration even easier:

Python

pip install ollama

JavaScript

npm install ollama
See the Libraries page for installation instructions and examples.

API Versioning

Ollama’s API isn’t strictly versioned but is designed to be stable and backwards compatible. Any deprecations are rare and will be announced in the release notes. You can check your Ollama version using:
curl http://localhost:11434/api/version

Next Steps

Authentication

Learn how to authenticate with ollama.com for cloud models

Client Libraries

Use official Python and JavaScript libraries

API Reference

Explore all available endpoints and parameters

Streaming

Implement real-time streaming responses

Build docs developers (and LLMs) love