Chat Completions API

The chat completions endpoint provides OpenAI-compatible API access to AgentOS agents. This allows you to integrate AgentOS with any tool that supports the OpenAI API format.

Endpoint

POST /v1/chat/completions

Request

model

string

default:"claude-sonnet-4-6"

The model to use for completion. Supports all 25 LLM providers and 47 models available in AgentOS.

messages

array

required

Array of message objects in OpenAI format.

Show Message Object

role

string

required

The role of the message author: system, user, or assistant.

content

string

required

The content of the message.

temperature

number

Sampling temperature (0-2). Higher values make output more random.

max_tokens

number

Maximum number of tokens to generate.

stream

boolean

default:false

Whether to stream responses. Currently processes through the default agent.

Response

string

Unique identifier for the chat completion (format: chatcmpl-xxxxxxxx).

object

string

Always "chat.completion".

created

number

Unix timestamp of when the completion was created.

model

string

The model used for the completion.

choices

array

Array of completion choices.

Show Choice Object

index

number

The index of the choice (always 0 for single completions).

message

object

The generated message.

Show Message Object

role

string

Always "assistant".

content

string

The content of the assistant’s message.

finish_reason

string

Reason for completion: "stop", "length", or "content_filter".

usage

object

Token usage information.

Show Usage Object

prompt_tokens

number

Number of tokens in the prompt.

completion_tokens

number

Number of tokens in the completion.

total_tokens

number

Total tokens used (prompt + completion).

Examples

curl -X POST http://localhost:3111/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What can you help me with?"
      }
    ]
  }'

Response Example

{
  "id": "chatcmpl-a3f2b1c4",
  "object": "chat.completion",
  "created": 1709856000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can help you with a wide range of tasks including:\n\n- Answering questions and providing information\n- Writing and editing code\n- Analyzing data and files\n- Searching the web\n- Managing workflows\n- And much more!\n\nWhat would you like assistance with?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 87,
    "total_tokens": 102
  }
}

Implementation Details

The endpoint:

Extracts the last message from the messages array
Routes it to the default agent via agent::chat function
Creates a unique session ID with format api:{timestamp}
Returns the response in OpenAI-compatible format

The actual agent processing includes:

Security capability checks
Memory recall and storage
LLM routing based on model selection
Tool execution (60+ tools available)
Loop guard protection
Session replay recording

Rate Limiting

Chat completions are limited to 60 requests per hour per IP address.

Supported Models

AgentOS supports 47 models across 25 providers. Some examples:

Anthropic: claude-opus-4, claude-sonnet-4-6, claude-haiku-4
OpenAI: gpt-4o, gpt-4o-mini, o1, o3-mini
Google: gemini-2.0-flash, gemini-2.0-pro
DeepSeek: deepseek-v3, deepseek-r1
And many more…

See the full list with agentos models list.

Rust API

TypeScript API

REST API

LLM Providers

Chat Completions API

Endpoint

Request

Response

Examples

Response Example

Implementation Details

Rate Limiting

Supported Models

Build docs developers (and LLMs) love

Rust API

TypeScript API

REST API

LLM Providers

​Endpoint

​Request

​Response

​Examples

​Response Example

​Implementation Details

​Rate Limiting

​Supported Models

Build docs developers (and LLMs) love

Endpoint

Request

Response

Examples

Response Example

Implementation Details

Rate Limiting

Supported Models