embeddings.model configuration key, the DOCS_MCP_EMBEDDING_MODEL environment variable, or the --embedding-model CLI flag.
Using an embedding model is optional but dramatically improves search quality by enabling semantic vector search.
Model Selection
If you leave the model empty but provideOPENAI_API_KEY, the server defaults to text-embedding-3-small.
Supported Options
OpenAI
text-embedding-3-small (default)text-embedding-3-largeOllama (Local)
openai:nomic-embed-textopenai:snowflake-arctic-embed2Google Vertex AI
vertex:text-embedding-004Google Gemini
gemini:embedding-001AWS Bedrock
aws:amazon.titan-embed-text-v1Azure OpenAI
microsoft:text-embedding-ada-002Provider Configuration
Provider credentials use the provider-specific environment variables listed below.Environment Variables
Embedding model to use (e.g.,
text-embedding-3-small, openai:nomic-embed-text)OpenAI API key for embeddings
Custom OpenAI-compatible API endpoint (e.g.,
http://localhost:11434/v1 for Ollama)Google API key for Gemini embeddings
Path to Google service account JSON for Vertex AI
AWS key for Bedrock embeddings
AWS secret for Bedrock embeddings
AWS region for Bedrock (e.g.,
us-east-1)Azure OpenAI API key
Azure OpenAI instance name
Azure OpenAI deployment name
Azure OpenAI API version (e.g.,
2024-02-01)Provider Examples
- OpenAI
- Ollama
- LM Studio
- Google Gemini
- Vertex AI
- AWS Bedrock
- Azure OpenAI
OpenAI (Default)
text-embedding-3-small is the default and provides the best balance of quality and cost.Choosing an Embedding Model
Best for Cost
Ollama (Local)Free, runs on your machine, no API costs
Best for Quality
OpenAI text-embedding-3-largeHighest quality embeddings, more expensive
Best Balance
OpenAI text-embedding-3-smallDefault option, good quality, reasonable cost
Enterprise
Azure OpenAI or Vertex AIPrivate deployments with SLAs
Performance Considerations
Vector Dimensions
Vector Dimensions
Different models produce embeddings with different dimensions:
text-embedding-3-small: 1536 dimensionstext-embedding-3-large: 3072 dimensionsnomic-embed-text: 768 dimensions
Batch Processing
Batch Processing
The server batches embedding requests to optimize API usage. Default batch size is 100 documents.Configure with:
Local vs Cloud
Local vs Cloud
Local models (Ollama, LM Studio):
- Free, no rate limits
- Slower on CPU, faster with GPU
- No internet required
- Fast processing
- Rate limits and costs
- Internet required
Switching Models
To switch models:Troubleshooting
Authentication Errors
Authentication Errors
Problem:
401 Unauthorized or Invalid API keySolution:- Verify your API key is correct
- Check environment variable names match exactly
- For Ollama, use any non-empty string as the API key
Connection Errors
Connection Errors
Problem:
ECONNREFUSED or connection timeoutSolution:- For Ollama: Ensure
ollama serveis running - Check
OPENAI_API_BASEURL is correct - Verify firewall settings
Model Not Found
Model Not Found
Problem: Model name not recognizedSolution:
- Check the model name matches exactly (case-sensitive)
- For Ollama: Pull the model first with
ollama pull <model-name> - Verify the provider prefix (e.g.,
openai:,vertex:,gemini:)
Next Steps
Scraping Sources
Learn how to index documentation from various sources
Search Documentation
Master search queries to leverage your embeddings
Configuration
Explore all configuration options
CLI Reference
Complete CLI command reference
