Overview
The /v1/chat/completions endpoint provides full OpenAI Chat Completions API compatibility. It accepts chat-formatted messages and maps them internally to the Responses API format while preserving streaming behavior and tool calling capabilities.
Authentication
Bearer token for API authentication. Format: Bearer YOUR_API_KEY
Request Body
ID of the model to use. Must be a valid model slug from the /v1/models endpoint.Example: "gpt-4.1", "gpt-5.2"
Array of message objects representing the conversation history. Must contain at least one message.Each message object has:
role (string, required): One of "system", "developer", "user", "assistant", or "tool"
content (string | array): Message content. For system/developer roles, must be text-only.
tool_calls (array, optional): For assistant messages, array of tool call objects
tool_call_id (string, required for tool role): ID of the tool call this message responds to
Array of tool definitions available to the model.Each tool object:
type (string): "function" or "web_search"
function (object): For function tools, contains name, description, and parameters
Supported tool types:
function: Custom function calls
web_search or web_search_preview: Web search capability
Unsupported types (will return error):
file_search, code_interpreter, computer_use, computer_use_preview, image_generation
Controls which tool the model should use.Options:
"none": Model will not call tools
"auto": Model decides whether to call tools
"required": Model must call at least one tool
- Object with
{"type": "function", "function": {"name": "tool_name"}}: Force specific tool
Whether to enable parallel tool calling. When true, the model can call multiple tools simultaneously.
Whether to stream the response as server-sent events.
true: Returns text/event-stream with chat.completion.chunk objects
false: Returns a single chat.completion object
Options for streaming responses.Properties:
include_usage (boolean): Include token usage in final chunk
include_obfuscation (boolean): Include obfuscation data in stream
Sampling temperature between 0 and 2. Higher values make output more random.
Nucleus sampling parameter. Alternative to temperature.
Maximum number of tokens to generate. Alias for max_completion_tokens.
Maximum number of tokens in the completion.
Format for the model’s output.Options:
{"type": "text"}: Plain text (default)
{"type": "json_object"}: Valid JSON object
{"type": "json_schema", "json_schema": {...}}: JSON matching provided schema
For json_schema type:
json_schema.name (string): Schema name, 1-64 chars, alphanumeric/underscore/hyphen
json_schema.schema (object): JSON Schema definition
json_schema.strict (boolean): Enable strict schema adherence
Stop sequence(s). Generation stops when these tokens are encountered.
Penalty for token presence. Range: -2.0 to 2.0.
Penalty for token frequency. Range: -2.0 to 2.0.
Whether to return log probabilities of output tokens.
Number of most likely tokens to return at each position (requires logprobs: true).
Random seed for deterministic sampling.
Number of completions to generate. Must be 1 (only value supported).
Response (Non-Streaming)
When stream is false or omitted, returns a chat.completion object:
Unique identifier for the completion.
Always "chat.completion".
Unix timestamp of creation.
Model used for completion.
Array of completion choices (always contains one choice).Each choice object:
index (integer): Choice index (always 0)
message (object): The assistant’s message
role (string): Always "assistant"
content (string | null): Text content of the message
refusal (string | null): Refusal message if model declined
tool_calls (array | null): Tool calls made by the model
finish_reason (string): Why generation stopped
"stop": Natural completion
"length": Max tokens reached
"tool_calls": Model called tools
"content_filter": Content filtered
Token usage information.Properties:
prompt_tokens (integer): Tokens in the prompt
completion_tokens (integer): Tokens in the completion
total_tokens (integer): Total tokens used
prompt_tokens_details (object | null):
cached_tokens (integer): Cached prompt tokens
completion_tokens_details (object | null):
reasoning_tokens (integer): Tokens used for reasoning
Response (Streaming)
When stream is true, returns text/event-stream with chat.completion.chunk objects:
Unique identifier for the chunk stream.
Always "chat.completion.chunk".
Unix timestamp of creation.
Array of delta choices.Each choice contains:
index (integer): Always 0
delta (object): Incremental content
role (string | null): Role (only in first chunk)
content (string | null): Content delta
refusal (string | null): Refusal delta
tool_calls (array | null): Tool call deltas
finish_reason (string | null): Reason when complete
Token usage (only in final chunk when stream_options.include_usage is true).
Examples
Basic Chat Completion
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
Streaming Response
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "user", "content": "Write a haiku about coding"}
],
"stream": true
}'
Streaming with Usage
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"stream": true,
"stream_options": {
"include_usage": true
}
}'
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "user", "content": "What are the latest news about AI?"}
],
"tools": [
{"type": "web_search"}
]
}'
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "user", "content": "Generate a person profile"}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person_profile",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"city": {"type": "string"}
},
"required": ["name", "age"]
},
"strict": true
}
}
}'
Multi-turn Conversation
curl https://api.example.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "What is 25 * 4?"}
]
}'
Content Type Restrictions
System and Developer Messages
- Must contain text-only content
- Cannot include images, files, or other media types
- Violations return
400 with invalid_request_error
User Messages
Supported content types:
- Text: String or
{"type": "text", "text": "..."}
- Images:
{"type": "image_url", "image_url": {"url": "..."}}
- Data URLs and HTTP(S) URLs supported
- Images over 8MB are automatically dropped
- Files:
{"type": "file", "file": {...}}
file_id is not supported and will return error
Unsupported:
- Audio input:
input_audio type returns 400 error
Assistant Messages
- Can include
content (text) and/or tool_calls
- Tool calls must have valid
id and function with name
- Must include
tool_call_id matching a previous assistant tool call
- Content becomes the tool output
Error Handling
All errors return OpenAI-compatible error envelopes:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "error_code",
"param": "field_name"
}
}
Common error codes:
invalid_request_error: Invalid request parameters
model_not_allowed: API key lacks access to requested model
no_accounts: No upstream accounts available
upstream_error: Upstream service error
For streaming requests, errors are emitted as error chunks followed by data: [DONE].
Model Restrictions
If your API key has allowed_models configured, only those models can be used. Requests for other models return:
{
"error": {
"message": "This API key does not have access to model 'gpt-5.2'",
"type": "invalid_request_error",
"code": "model_not_allowed"
}
}
Check available models at /v1/models.