Endpoint
Request
The model to use for completion. Supports all 25 LLM providers and 47 models available in AgentOS.
Array of message objects in OpenAI format.
Sampling temperature (0-2). Higher values make output more random.
Maximum number of tokens to generate.
Whether to stream responses. Currently processes through the
default agent.Response
Unique identifier for the chat completion (format:
chatcmpl-xxxxxxxx).Always
"chat.completion".Unix timestamp of when the completion was created.
The model used for the completion.
Array of completion choices.
Token usage information.
Examples
Response Example
Implementation Details
The endpoint:- Extracts the last message from the
messagesarray - Routes it to the
defaultagent viaagent::chatfunction - Creates a unique session ID with format
api:{timestamp} - Returns the response in OpenAI-compatible format
- Security capability checks
- Memory recall and storage
- LLM routing based on model selection
- Tool execution (60+ tools available)
- Loop guard protection
- Session replay recording
Rate Limiting
Chat completions are limited to 60 requests per hour per IP address.Supported Models
AgentOS supports 47 models across 25 providers. Some examples:- Anthropic:
claude-opus-4,claude-sonnet-4-6,claude-haiku-4 - OpenAI:
gpt-4o,gpt-4o-mini,o1,o3-mini - Google:
gemini-2.0-flash,gemini-2.0-pro - DeepSeek:
deepseek-v3,deepseek-r1 - And many more…
agentos models list.