Skip to main content
The chat completions endpoint provides OpenAI-compatible API access to AgentOS agents. This allows you to integrate AgentOS with any tool that supports the OpenAI API format.

Endpoint

POST /v1/chat/completions

Request

model
string
default:"claude-sonnet-4-6"
The model to use for completion. Supports all 25 LLM providers and 47 models available in AgentOS.
messages
array
required
Array of message objects in OpenAI format.
temperature
number
Sampling temperature (0-2). Higher values make output more random.
max_tokens
number
Maximum number of tokens to generate.
stream
boolean
default:false
Whether to stream responses. Currently processes through the default agent.

Response

id
string
Unique identifier for the chat completion (format: chatcmpl-xxxxxxxx).
object
string
Always "chat.completion".
created
number
Unix timestamp of when the completion was created.
model
string
The model used for the completion.
choices
array
Array of completion choices.
usage
object
Token usage information.

Examples

curl -X POST http://localhost:3111/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What can you help me with?"
      }
    ]
  }'

Response Example

{
  "id": "chatcmpl-a3f2b1c4",
  "object": "chat.completion",
  "created": 1709856000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can help you with a wide range of tasks including:\n\n- Answering questions and providing information\n- Writing and editing code\n- Analyzing data and files\n- Searching the web\n- Managing workflows\n- And much more!\n\nWhat would you like assistance with?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 87,
    "total_tokens": 102
  }
}

Implementation Details

The endpoint:
  1. Extracts the last message from the messages array
  2. Routes it to the default agent via agent::chat function
  3. Creates a unique session ID with format api:{timestamp}
  4. Returns the response in OpenAI-compatible format
The actual agent processing includes:
  • Security capability checks
  • Memory recall and storage
  • LLM routing based on model selection
  • Tool execution (60+ tools available)
  • Loop guard protection
  • Session replay recording

Rate Limiting

Chat completions are limited to 60 requests per hour per IP address.

Supported Models

AgentOS supports 47 models across 25 providers. Some examples:
  • Anthropic: claude-opus-4, claude-sonnet-4-6, claude-haiku-4
  • OpenAI: gpt-4o, gpt-4o-mini, o1, o3-mini
  • Google: gemini-2.0-flash, gemini-2.0-pro
  • DeepSeek: deepseek-v3, deepseek-r1
  • And many more…
See the full list with agentos models list.

Build docs developers (and LLMs) love