Skip to main content
Creates a model response for the given chat conversation. The gateway automatically routes requests to the optimal provider based on cost, latency, and availability.

Endpoint

POST https://api.llmgateway.io/v1/chat/completions

Authentication

Requires authentication using Bearer token or x-api-key header. See Authentication.

Request Body

model
string
required
The model to use for completion. Can be:
  • Specific model ID (e.g., gpt-4o, claude-3-5-sonnet-20241022)
  • Provider-prefixed model (e.g., openai/gpt-4o, anthropic/claude-3-5-sonnet-20241022)
  • auto for automatic model selection based on cost and capabilities
Example: "gpt-5"
messages
array
required
Array of message objects in the conversation.Each message has:
  • role (string): "user", "assistant", "system", or "tool"
  • content (string | array): Message content or array of content parts for multimodal messages
  • name (string, optional): Name of the message sender
  • tool_call_id (string, optional): ID of the tool call this message is responding to
  • tool_calls (array, optional): Tool calls made by the assistant
Example:
[
  {"role": "user", "content": "Hello!"}
]
temperature
number
default:"1.0"
Sampling temperature between 0 and 2. Higher values make output more random.Example: 0.7
max_tokens
number
Maximum number of tokens to generate in the completion.Example: 1000
top_p
number
default:"1.0"
Nucleus sampling parameter. Alternative to temperature.Example: 0.9
frequency_penalty
number
default:"0.0"
Penalty for repeated tokens based on frequency. Range: -2.0 to 2.0.Example: 0.0
presence_penalty
number
default:"0.0"
Penalty for repeated tokens based on presence. Range: -2.0 to 2.0.Example: 0.0
response_format
object
Format for the model response. Options:
  • {"type": "text"} - Plain text (default)
  • {"type": "json_object"} - Valid JSON object
  • {"type": "json_schema", "json_schema": {...}} - JSON matching schema
Example:
{"type": "json_object"}
stream
boolean
default:false
Whether to stream the response as Server-Sent Events.Example: false
tools
array
Array of tools the model can use. Each tool has:
  • type: "function" or "web_search"
  • function: For function tools, includes name, description, and parameters
Example:
[{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {"type": "string"}
      }
    }
  }
}]
tool_choice
string | object
Controls which tools the model uses:
  • "auto" - Model decides (default)
  • "none" - Never use tools
  • "required" - Must use at least one tool
  • {"type": "function", "function": {"name": "..."}} - Force specific function
reasoning_effort
string
Controls reasoning effort for reasoning-capable models.Options: "minimal", "low", "medium", "high", "xhigh"Example: "medium"
reasoning
object
Unified reasoning configuration. Alternative to reasoning_effort.Properties:
  • effort: Same as reasoning_effort
  • max_tokens: Exact number of tokens for reasoning (overrides effort)
Example:
{"effort": "medium", "max_tokens": 4000}
effort
string
Computational effort for supported models (currently claude-opus-4-5).Options: "low", "medium", "high"Example: "medium"
Enable native web search for models that support it.Example: true
free_models_only
boolean
default:false
When using auto routing, only route to free models.Example: false
no_reasoning
boolean
default:false
When using auto routing, exclude reasoning models from selection.Example: false
plugins
array
Plugins to enable for this request.Example:
[{"id": "response-healing"}]
Available plugins:
  • response-healing: Automatically repairs malformed JSON responses

Response

id
string
Unique identifier for the completion.
object
string
Object type, always "chat.completion".
created
number
Unix timestamp of when the completion was created.
model
string
The model used for completion.
choices
array
Array of completion choices.Each choice contains:
  • index (number): Choice index
  • message (object): The generated message
    • role (string): Always "assistant"
    • content (string | null): Message content
    • reasoning (string | null, optional): Internal reasoning for reasoning models
    • tool_calls (array, optional): Tool calls made by the model
    • images (array, optional): Generated images
  • finish_reason (string): Why generation stopped ("stop", "length", "tool_calls", etc.)
usage
object
Token usage information.Contains:
  • prompt_tokens (number): Tokens in the prompt
  • completion_tokens (number): Tokens in the completion
  • total_tokens (number): Total tokens used
  • reasoning_tokens (number, optional): Tokens used for reasoning
  • prompt_tokens_details (object, optional): Breakdown of prompt tokens
    • cached_tokens (number): Tokens served from cache
  • cost_usd_total (number, optional): Total cost in USD
  • cost_usd_input (number, optional): Input cost in USD
  • cost_usd_output (number, optional): Output cost in USD
  • cost_usd_cached_input (number, optional): Cached input cost in USD
  • cost_usd_request (number, optional): Per-request cost in USD
metadata
object
Routing and provider information.Contains:
  • requested_model (string): Model requested by client
  • requested_provider (string | null): Provider requested by client
  • used_model (string): Actual model used
  • used_provider (string): Actual provider used
  • underlying_used_model (string): Provider’s native model name
  • routing (array, optional): Routing attempts and errors

Examples

Basic Chat Completion

curl https://api.llmgateway.io/v1/chat/completions \
  -H "Authorization: Bearer $LLMGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Response

for chunk in client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

JSON Mode

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "List 3 colors in JSON format"}
    ],
    response_format={"type": "json_object"}
)

import json
colors = json.loads(response.choices[0].message.content)
print(colors)

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=tools,
    tool_choice="auto"
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Auto Routing

# Automatically select the cheapest model
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(f"Used: {response.metadata['used_provider']}/{response.metadata['used_model']}")
print(f"Cost: ${response.usage.cost_usd_total}")

Vision (Multimodal)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Response Example

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30,
    "cost_usd_total": 0.00045,
    "cost_usd_input": 0.00018,
    "cost_usd_output": 0.00027
  },
  "metadata": {
    "requested_model": "gpt-4o",
    "requested_provider": null,
    "used_model": "gpt-4o",
    "used_provider": "openai",
    "underlying_used_model": "gpt-4o-2024-08-06"
  }
}

Error Responses

Invalid Parameters

{
  "error": {
    "message": "Invalid request parameters",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_parameters"
  }
}

Insufficient Credits

{
  "error": true,
  "status": 402,
  "message": "Organization has insufficient credits"
}

Model Not Available

{
  "error": true,
  "status": 400,
  "message": "No provider key set for any of the providers that support model gpt-4o"
}

Build docs developers (and LLMs) love