Skip to main content
Providers are the bridge between Goose and AI models. They abstract different LLM APIs behind a common interface, enabling Goose to work with 25+ AI services from Anthropic, OpenAI, local models, and more.

What is a Provider?

A provider in Goose is a component that:
  • Connects to an AI model service (cloud or local)
  • Translates Goose’s conversation format to the model’s API format
  • Handles authentication and API keys
  • Streams responses back to the agent
  • Manages tool calling protocols
  • Tracks token usage and costs
// Core provider trait (simplified)
#[async_trait]
pub trait Provider: Send + Sync {
    /// Send a completion request to the model
    async fn complete(
        &self,
        system: String,
        messages: Vec<Message>,
        tools: Vec<Tool>,
    ) -> Result<BoxStream<ProviderMessage>>;
    
    /// List available models
    fn list_models(&self) -> Vec<ModelInfo>;
    
    /// Generate a session name from conversation
    async fn generate_session_name(
        &self,
        messages: &[Message],
    ) -> Result<String>;
}

Built-in Providers

Goose includes native support for many popular AI providers:

Cloud Providers

Anthropic

Claude models (Sonnet, Opus, Haiku)
  • Native tool calling
  • Prompt caching
  • Extended context windows

OpenAI

GPT models (GPT-4o, GPT-4, GPT-3.5)
  • Function calling
  • Vision support
  • Structured outputs

Google

Gemini models via Vertex AI
  • Multi-modal support
  • Large context windows
  • OAuth authentication

AWS Bedrock

Multiple model families
  • Anthropic Claude
  • Meta Llama
  • AWS credentials

Local & Open Source

Ollama

Local model execution
  • Privacy-first
  • No API keys required
  • Qwen, Llama, Mistral, etc.

LiteLLM

Unified gateway to 100+ models
  • Consistent API across providers
  • Load balancing
  • Fallback handling

OpenAI Compatible

Custom OpenAI-compatible servers
  • vLLM, LocalAI, etc.
  • Self-hosted models
  • Custom endpoints

Local Inference

Direct local model execution
  • llama.cpp integration
  • GGUF model support
  • CPU/GPU acceleration

Enterprise Providers

  • Azure OpenAI: Enterprise OpenAI deployment
  • Databricks: Databricks model serving
  • Snowflake Cortex: Snowflake’s AI models
  • GitHub Copilot: GitHub’s code models
  • OpenRouter: Multi-provider routing
  • Venice.ai: Privacy-focused inference

Provider Architecture

Configuration

Via Environment Variables

The simplest way to configure a provider:
# Anthropic
export GOOSE_PROVIDER=anthropic
export GOOSE_MODEL=claude-sonnet-4-20250514
export ANTHROPIC_API_KEY=sk-ant-...

# OpenAI
export GOOSE_PROVIDER=openai
export GOOSE_MODEL=gpt-4o
export OPENAI_API_KEY=sk-...

# Ollama (local)
export GOOSE_PROVIDER=ollama
export GOOSE_MODEL=qwen3-coder:latest
export OLLAMA_HOST=http://localhost:11434

Via Configuration File

# ~/.config/goose/config.yaml
GOOSE_PROVIDER: anthropic
GOOSE_MODEL: claude-sonnet-4-20250514
API keys stored separately in keyring or secrets file:
# ~/.config/goose/secrets.yaml (if GOOSE_DISABLE_KEYRING=1)
ANTHROPIC_API_KEY: sk-ant-...

Via Recipe

# recipe.yaml
title: GPT-4o Code Review
settings:
  goose_provider: openai
  goose_model: gpt-4o
  temperature: 0.2

Provider Implementation

Example: Anthropic Provider

// From crates/goose/src/providers/anthropic.rs
pub struct AnthropicProvider {
    api_key: String,
    base_url: String,
    http_client: reqwest::Client,
}

#[async_trait]
impl Provider for AnthropicProvider {
    async fn complete(
        &self,
        system: String,
        messages: Vec<Message>,
        tools: Vec<Tool>,
    ) -> Result<BoxStream<ProviderMessage>> {
        // Convert to Anthropic API format
        let request = self.build_request(
            system,
            messages,
            tools,
        )?;
        
        // Make streaming API call
        let response = self.http_client
            .post(&format!("{}/v1/messages", self.base_url))
            .header("x-api-key", &self.api_key)
            .header("anthropic-version", "2023-06-01")
            .json(&request)
            .send()
            .await?;
        
        // Stream and parse chunks
        let stream = response
            .bytes_stream()
            .map(|chunk| self.parse_chunk(chunk))
            .boxed();
        
        Ok(stream)
    }
    
    fn list_models(&self) -> Vec<ModelInfo> {
        vec![
            ModelInfo::with_cost(
                "claude-sonnet-4-20250514",
                200_000,  // context window
                0.000003, // input cost per token
                0.000015, // output cost per token
            ),
            // ... other models
        ]
    }
}

Tool Calling Translation

Different providers have different tool calling formats. Goose translates between them:
// Anthropic format
{
  "tools": [{
    "name": "read_file",
    "description": "Read a file",
    "input_schema": {
      "type": "object",
      "properties": {
        "path": {"type": "string"}
      }
    }
  }]
}

// OpenAI format
{
  "tools": [{
    "type": "function",
    "function": {
      "name": "read_file",
      "description": "Read a file",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {"type": "string"}
        }
      }
    }
  }]
}
Goose providers handle these translations automatically.

Model Capabilities

Providers expose model capabilities through ModelInfo:
pub struct ModelInfo {
    pub name: String,
    pub context_limit: usize,              // Max tokens
    pub input_token_cost: Option<f64>,     // Cost per input token
    pub output_token_cost: Option<f64>,    // Cost per output token
    pub supports_cache_control: Option<bool>,  // Prompt caching
}
Goose uses this information to:
  • Manage context windows
  • Estimate costs
  • Enable/disable features (like prompt caching)
  • Choose appropriate models for subagents

Custom Providers

You can add custom providers without modifying Goose’s code:

1. Declarative Provider (JSON)

For OpenAI-compatible APIs:
// ~/.config/goose/custom_providers/my-provider.json
{
  "name": "my_provider",
  "engine": "openai",
  "display_name": "My Custom LLM",
  "description": "Internal LLM endpoint",
  "api_key_env": "MY_PROVIDER_API_KEY",
  "base_url": "https://llm.company.internal/v1",
  "models": [
    {
      "name": "company-llm-v1",
      "context_limit": 32768,
      "input_token_cost": 0.000001,
      "output_token_cost": 0.000002
    }
  ],
  "supports_streaming": true,
  "requires_auth": true
}
Supported engines:
  • openai: OpenAI-compatible API
  • anthropic: Anthropic-compatible API
  • ollama: Ollama-compatible API

2. Code-Based Provider (Rust)

For custom protocols:
// 1. Create a new file: crates/goose/src/providers/my_provider.rs

use super::base::{Provider, ModelInfo, ProviderMessage};
use async_trait::async_trait;
use anyhow::Result;

pub struct MyProvider {
    api_key: String,
    endpoint: String,
}

impl MyProvider {
    pub fn new(api_key: String, endpoint: String) -> Self {
        Self { api_key, endpoint }
    }
}

#[async_trait]
impl Provider for MyProvider {
    async fn complete(
        &self,
        system: String,
        messages: Vec<Message>,
        tools: Vec<Tool>,
    ) -> Result<BoxStream<ProviderMessage>> {
        // Your implementation here
        todo!()
    }
    
    fn list_models(&self) -> Vec<ModelInfo> {
        vec![ModelInfo::new("my-model-v1", 8192)]
    }
}

// 2. Register in crates/goose/src/providers/init.rs
pub fn create_provider(config: &Config) -> Result<Box<dyn Provider>> {
    match config.provider.as_str() {
        "anthropic" => Ok(Box::new(AnthropicProvider::new(config)?)),
        "openai" => Ok(Box::new(OpenAIProvider::new(config)?)),
        "my_provider" => Ok(Box::new(MyProvider::new(
            config.get_secret("MY_PROVIDER_API_KEY")?,
            config.get("MY_PROVIDER_ENDPOINT")?,
        )?)),
        // ...
    }
}

Provider Selection

Goose determines which provider to use via configuration precedence:
  1. Subagent settings (highest priority)
    subagent(settings: {provider: "openai", model: "gpt-4o-mini"})
    
  2. Recipe settings
    settings:
      goose_provider: anthropic
      goose_model: claude-sonnet-4-20250514
    
  3. Environment variables
    GOOSE_PROVIDER=ollama
    GOOSE_MODEL=qwen3-coder:latest
    
  4. Config file
    GOOSE_PROVIDER: anthropic
    
  5. Default (Anthropic Claude)

Streaming

All providers support streaming responses:
pub enum ProviderMessage {
    Text(String),              // Text chunk
    ToolUse(ToolRequest),      // Tool call request
    Thinking(String),          // Model reasoning (if supported)
    Usage(TokenUsage),         // Token counts
    Done,                      // Stream complete
}

// Agent consumes stream:
let mut stream = provider.complete(system, messages, tools).await?;
while let Some(msg) = stream.next().await {
    match msg {
        ProviderMessage::Text(text) => {
            // Stream to user immediately
            send_to_user(text).await?;
        }
        ProviderMessage::ToolUse(tool) => {
            // Execute tool
            execute_tool(tool).await?;
        }
        ProviderMessage::Done => break,
    }
}

Token Usage Tracking

Providers report token usage for cost estimation:
pub struct TokenUsage {
    pub input_tokens: usize,
    pub output_tokens: usize,
    pub cache_read_tokens: Option<usize>,   // For prompt caching
    pub cache_write_tokens: Option<usize>,
}

// Stored in session
session.accumulated_input_tokens += usage.input_tokens;
session.accumulated_output_tokens += usage.output_tokens;

// Calculate cost
let cost = (usage.input_tokens as f64 * model_info.input_token_cost)
         + (usage.output_tokens as f64 * model_info.output_token_cost);

Error Handling

Providers return standardized errors:
pub enum ProviderError {
    AuthenticationError(String),   // Invalid API key
    RateLimitError(String),        // Rate limit hit
    InvalidRequestError(String),   // Bad request
    ModelNotFoundError(String),    // Unknown model
    NetworkError(String),          // Connection issues
    StreamError(String),           // Streaming failure
    // ...
}
The agent’s retry manager handles transient errors automatically:
loop {
    match provider.complete(...).await {
        Ok(stream) => return Ok(stream),
        Err(ProviderError::RateLimitError(_)) => {
            // Wait and retry
            sleep(backoff.next_delay()).await;
        }
        Err(ProviderError::NetworkError(_)) => {
            // Retry with backoff
            sleep(backoff.next_delay()).await;
        }
        Err(e) => return Err(e), // Don't retry auth errors, etc.
    }
}

Multi-Provider Workflows

You can use different providers for different tasks:
title: Multi-Provider Analysis
instructions: |
  Use different models for different tasks:
  - GPT-4o-mini for simple file operations
  - Claude Sonnet for complex analysis
  - Local Ollama for privacy-sensitive data

settings:
  goose_provider: anthropic
  goose_model: claude-sonnet-4-20250514

prompt: |
  # Use cheap model for file listing
  subagent(
    instructions: "List all Python files",
    settings: {provider: "openai", model: "gpt-4o-mini"}
  )
  
  # Use powerful model for analysis
  subagent(
    instructions: "Analyze the architecture for security issues",
    settings: {provider: "anthropic", model: "claude-sonnet-4-20250514"}
  )
  
  # Use local model for sensitive data
  subagent(
    instructions: "Process customer data locally",
    settings: {provider: "ollama", model: "qwen3-coder:latest"}
  )

Provider Comparison

ProviderTool CallingStreamingVisionLocalCost
AnthropicNativeYesYes (Claude 3.5+)No$$$
OpenAIFunction callingYesYes (GPT-4V)No$$$
OllamaVia toolshimYesSome modelsYesFree
Google GeminiNativeYesYesNo$$
AWS BedrockModel-dependentYesModel-dependentNo$$
LiteLLMPass-throughYesModel-dependentNoVaries
Local InferenceVia toolshimYesNoYesFree

Tool Calling Approaches

Native: Provider API has built-in tool calling support
  • Anthropic, OpenAI, Google Gemini
  • Best accuracy and performance
Toolshim: Goose adds tool calling via system prompts
  • Ollama, local models
  • Works but less reliable
  • Good for experimentation

Best Practices

  • Simple tasks: Use cheaper/faster models (GPT-4o-mini, Claude Haiku)
  • Complex reasoning: Use powerful models (Claude Sonnet, GPT-4o)
  • Code generation: Use code-specialized models (Qwen Coder, Claude)
  • Privacy-sensitive: Use local models (Ollama)
// Check model context before sending
let token_count = count_tokens(&messages);
let model_info = provider.list_models()
    .find(|m| m.name == model_name)?;

if token_count > model_info.context_limit * 0.75 {
    // Compact messages
    messages = compact_messages(messages);
}
# Configure retry behavior
retry:
  max_retries: 3
  initial_delay_ms: 1000
  max_delay_ms: 30000
  backoff_multiplier: 2.0
Anthropic and some other providers support caching system prompts:
// Automatically enabled for supported models
// Significantly reduces cost for repeated requests
if model_info.supports_cache_control {
    // System prompt is cached
    // Only pay for new user messages
}

Troubleshooting

Common Issues

“Authentication failed”
# Check API key is set
echo $ANTHROPIC_API_KEY

# Verify in config
goose configure get ANTHROPIC_API_KEY
“Model not found”
# List available models
goose configure list-models

# Check provider documentation for exact model names
“Rate limit exceeded”
  • Wait and retry (automatic)
  • Upgrade API tier
  • Use multiple API keys with load balancing (via LiteLLM)
“Context length exceeded”
# Reduce max_turns or enable aggressive compaction
settings:
  max_turns: 20  # Limit conversation length

Next Steps

Extensions

Learn about the tools providers can use

Recipes

Configure providers in recipes

Configuration

Advanced provider configuration

Custom Distributions

Bundle custom providers

Build docs developers (and LLMs) love