Introduction

The Ollama API provides a comprehensive REST interface for running and managing large language models locally. Whether you’re building chatbots, generating text, creating embeddings, or managing models, the Ollama API gives you full control over your AI infrastructure.

Base URL

After installation, Ollama’s API is served by default at:

http://localhost:11434/api

For running cloud models on ollama.com, the same API is available with the following base URL:

https://ollama.com/api

You can customize the host using the OLLAMA_HOST environment variable. The default is http://localhost:11434.

Quick Start Example

Once Ollama is running, you can immediately start making API requests. Here’s a simple example using curl:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Key Features

Chat Completions

Multi-turn conversations with context preservation and streaming support

Text Generation

Generate text from prompts with full control over model parameters

Embeddings

Create vector embeddings for semantic search and RAG applications

Model Management

Pull, create, copy, and delete models programmatically

Core Endpoints

The Ollama API provides the following main endpoints:

Generate

Generate text completions from a prompt

POST /api/generate

Chat

Create multi-turn conversations with message history

POST /api/chat

Embeddings

Generate embeddings for text input

POST /api/embed
POST /api/embeddings

Models

List, pull, create, copy, and delete models

GET /api/tags
POST /api/pull
POST /api/create
POST /api/copy
DELETE /api/delete

Response Formats

The API supports both streaming and non-streaming responses:

Streaming (default): Responses are sent as newline-delimited JSON (NDJSON) for real-time output
Non-streaming: Set "stream": false in your request for a single complete response

Streaming responses use the application/x-ndjson content type, while non-streaming responses use application/json.

Official Client Libraries

Ollama provides official client libraries that make integration even easier:

Python

pip install ollama

JavaScript

npm install ollama

See the Libraries page for installation instructions and examples.

API Versioning

Ollama’s API isn’t strictly versioned but is designed to be stable and backwards compatible. Any deprecations are rare and will be announced in the release notes. You can check your Ollama version using:

curl http://localhost:11434/api/version

Next Steps

Authentication

Learn how to authenticate with ollama.com for cloud models

Client Libraries

Use official Python and JavaScript libraries

API Reference

Explore all available endpoints and parameters

Streaming

Implement real-time streaming responses

Getting Started

Concepts

Compatibility

Endpoints

Base URL

Quick Start Example

Key Features

Chat Completions

Text Generation

Embeddings

Model Management

Core Endpoints

Response Formats

Official Client Libraries

Python

JavaScript

API Versioning

Next Steps

Authentication

Client Libraries

API Reference

Streaming

Build docs developers (and LLMs) love

Getting Started

Concepts

Compatibility

Endpoints

​Base URL

​Quick Start Example

​Key Features

Chat Completions

Text Generation

Embeddings

Model Management

​Core Endpoints

​Response Formats

​Official Client Libraries

Python

JavaScript

​API Versioning

​Next Steps

Authentication

Client Libraries

API Reference

Streaming

Build docs developers (and LLMs) love

Base URL

Quick Start Example

Key Features

Core Endpoints

Response Formats

Official Client Libraries

API Versioning

Next Steps