Embedding Models

This guide details how to configure the embedding models used for vector search. You can set the embedding model using the embeddings.model configuration key, the DOCS_MCP_EMBEDDING_MODEL environment variable, or the --embedding-model CLI flag.

Using an embedding model is optional but dramatically improves search quality by enabling semantic vector search.

Model Selection

If you leave the model empty but provide OPENAI_API_KEY, the server defaults to text-embedding-3-small.

Supported Options

OpenAI

text-embedding-3-small (default)text-embedding-3-large

Ollama (Local)

openai:nomic-embed-textopenai:snowflake-arctic-embed2

Google Vertex AI

vertex:text-embedding-004

Google Gemini

gemini:embedding-001

AWS Bedrock

aws:amazon.titan-embed-text-v1

Azure OpenAI

microsoft:text-embedding-ada-002

Provider Configuration

Provider credentials use the provider-specific environment variables listed below.

Environment Variables

DOCS_MCP_EMBEDDING_MODEL

string

Embedding model to use (e.g., text-embedding-3-small, openai:nomic-embed-text)

OPENAI_API_KEY

string

OpenAI API key for embeddings

OPENAI_API_BASE

string

Custom OpenAI-compatible API endpoint (e.g., http://localhost:11434/v1 for Ollama)

GOOGLE_API_KEY

string

Google API key for Gemini embeddings

GOOGLE_APPLICATION_CREDENTIALS

string

Path to Google service account JSON for Vertex AI

AWS_ACCESS_KEY_ID

string

AWS key for Bedrock embeddings

AWS_SECRET_ACCESS_KEY

string

AWS secret for Bedrock embeddings

AWS_REGION

string

AWS region for Bedrock (e.g., us-east-1)

AZURE_OPENAI_API_KEY

string

Azure OpenAI API key

AZURE_OPENAI_API_INSTANCE_NAME

string

Azure OpenAI instance name

AZURE_OPENAI_API_DEPLOYMENT_NAME

string

Azure OpenAI deployment name

AZURE_OPENAI_API_VERSION

string

Azure OpenAI API version (e.g., 2024-02-01)

Provider Examples

OpenAI (Default)

OPENAI_API_KEY="sk-proj-your-openai-api-key" \
DOCS_MCP_EMBEDDING_MODEL="text-embedding-3-small" \
npx @arabold/docs-mcp-server@latest

text-embedding-3-small is the default and provides the best balance of quality and cost.

Ollama (Local)

Run local models compatible with the OpenAI API format.

Start Ollama

ollama serve

Pull embedding model

ollama pull nomic-embed-text

Configure server

OPENAI_API_KEY="ollama" \
OPENAI_API_BASE="http://localhost:11434/v1" \
DOCS_MCP_EMBEDDING_MODEL="openai:nomic-embed-text" \
npx @arabold/docs-mcp-server@latest

Ollama provides completely free, local embeddings with no API costs.

LM Studio (Local)

Connect to LM Studio’s local inference server.

OPENAI_API_KEY="lmstudio" \
OPENAI_API_BASE="http://localhost:1234/v1" \
DOCS_MCP_EMBEDDING_MODEL="text-embedding-qwen3-embedding-4b" \
npx @arabold/docs-mcp-server@latest

Google Gemini

Use Google’s Gemini API directly.

GOOGLE_API_KEY="your-google-api-key" \
DOCS_MCP_EMBEDDING_MODEL="gemini:embedding-001" \
npx @arabold/docs-mcp-server@latest

Google Vertex AI

For enterprise GCP deployments.

GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/gcp-service-account.json" \
DOCS_MCP_EMBEDDING_MODEL="vertex:text-embedding-004" \
npx @arabold/docs-mcp-server@latest

AWS Bedrock

Use Amazon Titan or other Bedrock-hosted models.

AWS_ACCESS_KEY_ID="your-aws-access-key-id" \
AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key" \
AWS_REGION="us-east-1" \
DOCS_MCP_EMBEDDING_MODEL="aws:amazon.titan-embed-text-v1" \
npx @arabold/docs-mcp-server@latest

Azure OpenAI

Connect to your private Azure OpenAI deployment.

AZURE_OPENAI_API_KEY="your-azure-openai-api-key" \
AZURE_OPENAI_API_INSTANCE_NAME="your-instance-name" \
AZURE_OPENAI_API_DEPLOYMENT_NAME="your-deployment-name" \
AZURE_OPENAI_API_VERSION="2024-02-01" \
DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
npx @arabold/docs-mcp-server@latest

Choosing an Embedding Model

Best for Cost

Ollama (Local)Free, runs on your machine, no API costs

Best for Quality

OpenAI text-embedding-3-largeHighest quality embeddings, more expensive

Best Balance

OpenAI text-embedding-3-smallDefault option, good quality, reasonable cost

Enterprise

Azure OpenAI or Vertex AIPrivate deployments with SLAs

Performance Considerations

Vector Dimensions

Different models produce embeddings with different dimensions:

text-embedding-3-small: 1536 dimensions
text-embedding-3-large: 3072 dimensions
nomic-embed-text: 768 dimensions

Higher dimensions generally mean better quality but more storage and slower search.

Batch Processing

The server batches embedding requests to optimize API usage. Default batch size is 100 documents.Configure with:

DOCS_MCP_EMBEDDINGS_BATCH_SIZE=50

Local vs Cloud

Local models (Ollama, LM Studio):

Free, no rate limits
Slower on CPU, faster with GPU
No internet required

Cloud models (OpenAI, Gemini, etc.):

Fast processing
Rate limits and costs
Internet required

Switching Models

Changing the embedding model requires re-indexing all documentation because embeddings from different models are not compatible.

To switch models:

Remove existing documentation

npx @arabold/docs-mcp-server@latest remove <library-name>

Configure new model

Set the new DOCS_MCP_EMBEDDING_MODEL and provider credentials

Re-scrape documentation

npx @arabold/docs-mcp-server@latest scrape <library> <url>

Troubleshooting

Authentication Errors

Problem: 401 Unauthorized or Invalid API keySolution:

Verify your API key is correct
Check environment variable names match exactly
For Ollama, use any non-empty string as the API key

Connection Errors

Problem: ECONNREFUSED or connection timeoutSolution:

For Ollama: Ensure ollama serve is running
Check OPENAI_API_BASE URL is correct
Verify firewall settings

Model Not Found

Problem: Model name not recognizedSolution:

Check the model name matches exactly (case-sensitive)
For Ollama: Pull the model first with ollama pull <model-name>
Verify the provider prefix (e.g., openai:, vertex:, gemini:)

Next Steps

Scraping Sources

Learn how to index documentation from various sources

Search Documentation

Master search queries to leverage your embeddings

Configuration

Explore all configuration options

CLI Reference

Complete CLI command reference

Getting Started

Setup

Guides

Architecture

Infrastructure

Model Selection

Supported Options

OpenAI

Ollama (Local)

Google Vertex AI

Google Gemini

AWS Bedrock

Azure OpenAI

Provider Configuration

Environment Variables

Provider Examples

OpenAI (Default)

Ollama (Local)

LM Studio (Local)

Google Gemini

Google Vertex AI

AWS Bedrock

Azure OpenAI

Choosing an Embedding Model

Best for Cost

Best for Quality

Best Balance

Enterprise

Performance Considerations

Switching Models

Troubleshooting

Next Steps

Scraping Sources

Search Documentation

Configuration

CLI Reference

Build docs developers (and LLMs) love

Getting Started

Setup

Guides

Architecture

Infrastructure

​Model Selection

​Supported Options

OpenAI

Ollama (Local)

Google Vertex AI

Google Gemini

AWS Bedrock

Azure OpenAI

​Provider Configuration

​Environment Variables

​Provider Examples

​OpenAI (Default)

​Ollama (Local)

​LM Studio (Local)

​Google Gemini

​Google Vertex AI

​AWS Bedrock

​Azure OpenAI

​Choosing an Embedding Model

Best for Cost

Best for Quality

Best Balance

Enterprise

​Performance Considerations

​Switching Models

​Troubleshooting

​Next Steps

Scraping Sources

Search Documentation

Configuration

CLI Reference

Build docs developers (and LLMs) love

Model Selection

Supported Options

Provider Configuration

Environment Variables

Provider Examples

OpenAI (Default)

Ollama (Local)

LM Studio (Local)

Google Gemini

Google Vertex AI

AWS Bedrock

Azure OpenAI

Choosing an Embedding Model

Performance Considerations

Switching Models

Troubleshooting

Next Steps