POST /api/chat

Generate the next message in a chat with a provided model. This endpoint supports streaming responses and maintains conversation history through messages.

Request

Endpoint

POST /api/chat

Request Body

model

string

required

The model name to use for chat completion (e.g., llama3.2, mistral)

messages

array

required

Array of message objects representing the conversation history

Show Message Object

role

string

required

Role of the message sender: system, user, or assistant

content

string

required

The content of the message

images

array

Optional array of base64-encoded images (for multimodal models)

tool_calls

array

Tool calls made by the assistant

stream

boolean

default:"true"

Enable streaming of response chunks. Set to false to wait for the complete response.

format

string | object

Format to return response in. Use "json" for JSON mode or provide a JSON schema object.

options

object

Model-specific options to customize inference behavior

Show Common Options

temperature

float

default:"0.8"

Controls randomness in generation (0.0 to 2.0)

top_k

integer

default:"40"

Limits token selection to top K most likely tokens

top_p

float

default:"0.9"

Nucleus sampling threshold

num_predict

integer

default:"-1"

Maximum number of tokens to predict (-1 for unlimited)

seed

integer

default:"-1"

Random seed for reproducible generation

keep_alive

string | number

default:"5m"

Duration to keep the model loaded in memory (e.g., "5m", "1h", or -1 for indefinite)

tools

array

List of tools the model can use for function calling

think

boolean | string

Enable thinking mode for reasoning models. Can be true/false or "high", "medium", "low"

truncate

boolean

default:"true"

Truncate chat history if prompt exceeds context length

shift

boolean

default:"true"

Shift chat history when hitting context length instead of erroring

logprobs

boolean

default:"false"

Return log probabilities for output tokens

top_logprobs

integer

default:"0"

Number of most likely tokens to return at each position (0-20). Requires logprobs: true

Response

Response Fields

model

string

The model name used for generation

created_at

string

Timestamp of when the response was created (ISO 8601 format)

message

object

The generated message

Show Message Object

role

string

Role of the message (typically assistant)

content

string

The generated text content

thinking

string

Reasoning content when think mode is enabled

tool_calls

array

Tool calls requested by the model

done

boolean

Whether the response is complete

done_reason

string

Reason for completion: stop, length, load, unload

total_duration

integer

Total time spent generating response (nanoseconds)

load_duration

integer

Time spent loading the model (nanoseconds)

prompt_eval_count

integer

Number of tokens in the prompt

prompt_eval_duration

integer

Time spent evaluating the prompt (nanoseconds)

eval_count

integer

Number of tokens generated

eval_duration

integer

Time spent generating response (nanoseconds)

logprobs

array

Log probability information for each token (when logprobs: true)

Examples

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ],
  "stream": false
}'

Example Response

{
  "model": "llama3.2",
  "created_at": "2024-02-24T12:34:56.789Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue due to Rayleigh scattering..."
  },
  "done": true,
  "done_reason": "stop",
  "total_duration": 5432109876,
  "load_duration": 123456789,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 987654321,
  "eval_count": 298,
  "eval_duration": 4321098765
}

Streaming Example

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ]
}'

Error Responses

error

string

Error message describing what went wrong

Common Errors

400 Bad Request: Invalid request body or model name
404 Not Found: Model not found (need to pull it first)
500 Internal Server Error: Server error during generation

When streaming is disabled ("stream": false), the complete response is returned as a single JSON object. With streaming enabled (default), responses are sent as newline-delimited JSON (NDJSON).

Getting Started

Concepts

Compatibility

Endpoints

POST /api/chat

Request

Endpoint

Request Body

Response

Response Fields

Examples

Example Response

Streaming Example

Error Responses

Common Errors

Build docs developers (and LLMs) love

Getting Started

Concepts

Compatibility

Endpoints

​Request

​Endpoint

​Request Body

​Response

​Response Fields

​Examples

​Example Response

​Streaming Example

​Error Responses

​Common Errors

Build docs developers (and LLMs) love

Request

Endpoint

Request Body

Response

Response Fields

Examples

Example Response

Streaming Example

Error Responses

Common Errors