Skip to main content

Overview

Filebright uses OpenRouter as the AI provider for:
  • Embeddings: Converting document chunks into vector representations for semantic search
  • Chat: Answering questions using RAG (Retrieval-Augmented Generation)
OpenRouter provides unified access to multiple AI models from different providers (OpenAI, Anthropic, Meta, etc.) through a single API.

Why OpenRouter?

  • Model flexibility: Switch between different models without changing code
  • Cost optimization: Choose models based on your budget and performance needs
  • Reliability: Automatic fallbacks if a model is unavailable
  • Simple billing: One account for all AI providers

Getting started

Create an OpenRouter account

  1. Visit openrouter.ai
  2. Sign up for a free account
  3. Add credits to your account (pay-as-you-go pricing)
  4. Navigate to Keys in the dashboard
  5. Create a new API key
  6. Copy the key and add it to your .env file
.env
OPENROUTER_API_KEY=sk-or-v1-...
Keep your API key secure. Never commit it to version control or share it publicly.

Configuration

Environment variables

OPENROUTER_API_KEY
string
required
Your OpenRouter API key.Example: sk-or-v1-1234567890abcdef
OPENROUTER_EMBEDDING_MODEL
string
default:"text-embedding-3-small"
The model used to generate vector embeddings for document chunks.Recommended models:
  • text-embedding-3-small - Fast, cost-effective (1536 dimensions)
  • text-embedding-3-large - Higher quality (3072 dimensions)
  • text-embedding-ada-002 - Legacy OpenAI model (1536 dimensions)
OPENROUTER_CHAT_MODEL
string
default:"openai/gpt-3.5-turbo"
The model used for RAG-based question answering.Recommended models:
  • openai/gpt-3.5-turbo - Fast, cost-effective
  • openai/gpt-4-turbo - Higher quality reasoning
  • openai/gpt-4o - Balanced performance and cost
  • anthropic/claude-3.5-sonnet - Excellent for long context
  • meta-llama/llama-3.1-70b-instruct - Open source alternative

Configuration file

The OpenRouter configuration is defined in backend/config/services.php:
backend/config/services.php
'openrouter' => [
    'key' => env('OPENROUTER_API_KEY'),
    'embedding_model' => env('OPENROUTER_EMBEDDING_MODEL', 'text-embedding-3-small'),
    'chat_model' => env('OPENROUTER_CHAT_MODEL', 'openai/gpt-3.5-turbo'),
],

Embedding models

Embedding models convert text into vector representations for semantic search.

Available models

ModelDimensionsCostBest for
text-embedding-3-small1536$0.02/1M tokensGeneral use, cost-effective
text-embedding-3-large3072$0.13/1M tokensHigh accuracy, larger docs
text-embedding-ada-0021536$0.10/1M tokensLegacy compatibility

Choosing an embedding model

For most use cases, text-embedding-3-small provides excellent performance at the lowest cost.
Consider text-embedding-3-large if:
  • You need higher accuracy for complex queries
  • Your documents contain specialized or technical content
  • Cost is not a primary concern
If you change the embedding model after documents are already processed, you must:
  1. Delete all existing embeddings from MongoDB
  2. Update the vector index dimensions
  3. Reprocess all documents to generate new embeddings

How embeddings work

The EmbeddingService handles embedding generation:
backend/app/Services/EmbeddingService.php
public function getBulkEmbeddings(array $texts): array
{
    $response = Http::withHeaders([
        'Authorization' => 'Bearer ' . $this->apiKey,
        'HTTP-Referer' => config('app.url'),
    ])->post('https://openrouter.ai/api/v1/embeddings', [
        'model' => $this->model,
        'input' => $texts,
    ]);

    return array_map(fn($item) => $item['embedding'], $response->json('data'));
}
Embeddings are generated in bulk for efficiency when processing documents.

Chat models

Chat models generate answers to user questions using retrieved context from your documents.

Available models

ModelContext WindowCost (Input/Output)Best for
openai/gpt-3.5-turbo16K0.50/0.50/1.50 per 1M tokensFast, cost-effective
openai/gpt-4-turbo128K10/10/30 per 1M tokensHigh quality, complex queries
openai/gpt-4o128K2.50/2.50/10 per 1M tokensBalanced option
anthropic/claude-3.5-sonnet200K3/3/15 per 1M tokensLong documents, analysis
anthropic/claude-3-haiku200K0.25/0.25/1.25 per 1M tokensFast, efficient
meta-llama/llama-3.1-70b-instruct128K0.88/0.88/0.88 per 1M tokensOpen source, privacy

Choosing a chat model

Start with openai/gpt-3.5-turbo for development, then upgrade to openai/gpt-4o or anthropic/claude-3.5-sonnet for production.
Consider factors:
  • Budget: GPT-3.5-turbo is most cost-effective
  • Quality: GPT-4-turbo and Claude-3.5-sonnet provide better reasoning
  • Context length: Claude models support longer contexts for large documents
  • Speed: GPT-3.5-turbo and Claude-3-haiku are fastest

How RAG works

The RAGService orchestrates the retrieval and generation process:
backend/app/Services/RAGService.php
public function answer(string $query, int $userId): string
{
    // 1. Generate embedding for the query
    $queryEmbedding = $this->embeddingService->getEmbedding($query);
    
    // 2. Retrieve relevant chunks from MongoDB
    $chunks = $this->retrieveContext($queryEmbedding, $userId);
    
    // 3. Build context from retrieved chunks
    $context = $chunks->pluck('content')->implode("\n\n---\n\n");
    
    // 4. Generate answer using chat model
    return $this->getLLMResponse($query, $context);
}
The process:
  1. User asks a question
  2. Question is converted to an embedding
  3. MongoDB vector search finds similar document chunks
  4. Chunks are combined as context
  5. Chat model generates an answer based on the context

Vector search parameters

The RAG system uses these parameters for retrieval:
'$vectorSearch' => [
    'index' => 'vector_index',      // MongoDB index name
    'path' => 'embedding',          // Field containing embeddings
    'queryVector' => $embedding,    // Query embedding
    'numCandidates' => 100,         // Number of candidates to consider
    'limit' => 3,                   // Top results to return
    'filter' => [
        'metadata.user_id' => $userId  // Only search user's documents
    ]
]
You can adjust these in backend/app/Services/RAGService.php:
  • numCandidates: Higher = more thorough search, slower
  • limit: More chunks = more context, higher cost

API endpoints

OpenRouter provides these endpoints:

Embeddings

POST https://openrouter.ai/api/v1/embeddings
Request:
{
  "model": "text-embedding-3-small",
  "input": ["text to embed", "another text"]
}
Response:
{
  "data": [
    {"embedding": [0.123, -0.456, ...], "index": 0},
    {"embedding": [0.789, -0.012, ...], "index": 1}
  ],
  "model": "text-embedding-3-small",
  "usage": {"prompt_tokens": 10, "total_tokens": 10}
}

Chat completions

POST https://openrouter.ai/api/v1/chat/completions
Request:
{
  "model": "openai/gpt-3.5-turbo",
  "messages": [
    {"role": "user", "content": "Your question with context"}
  ]
}
Response:
{
  "choices": [
    {
      "message": {"role": "assistant", "content": "The answer"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 50, "completion_tokens": 100}
}

Cost optimization

Tips for reducing costs

  1. Use efficient models
    • text-embedding-3-small instead of text-embedding-3-large
    • openai/gpt-3.5-turbo for simple queries
  2. Optimize chunk size
    • Larger chunks = fewer embeddings to generate and store
    • Smaller chunks = more precise retrieval
    • Default: 1000 characters with 200 character overlap
  3. Reduce retrieved chunks
    • Lower limit in vector search (default: 3)
    • Fewer chunks = less context sent to chat model
  4. Cache responses
    • Implement caching for common queries
    • Reuse answers for identical questions
  5. Monitor usage
    • Check OpenRouter dashboard regularly
    • Set up usage alerts
    • Review which models are being used most

Example costs

For a typical document upload and query: Upload 10-page PDF:
  • Text extraction: Free
  • Generate embeddings: ~5,000 tokens = $0.0001
  • Store in MongoDB: Free
  • Total: ~$0.0001
Ask a question:
  • Query embedding: ~20 tokens = $0.000001
  • Retrieve 3 chunks: Free
  • Generate answer: ~500 tokens = $0.00075
  • Total: ~$0.00075
With the default configuration, 1,000 document queries costs approximately $0.75.

Testing the integration

Verify your OpenRouter integration:

Test embeddings

php artisan tinker
$service = app(App\Services\EmbeddingService::class);
$embedding = $service->getEmbedding('test text');
count($embedding); // Should return 1536 for text-embedding-3-small

Test chat

php artisan tinker
$service = app(App\Services\RAGService::class);
$answer = $service->answer('What is AI?', 1);
echo $answer; // Should return a response (or "no relevant info" if no docs)

Troubleshooting

  • Verify the API key is correct in .env
  • Check for extra spaces or newlines
  • Ensure the key starts with sk-or-v1-
  • Generate a new key from OpenRouter dashboard
  • Check your balance at openrouter.ai
  • Add credits to your account
  • Review recent usage to identify unexpected costs
  • Verify the model name is correct
  • Check available models at openrouter.ai/models
  • Some models require special access
  • Check MongoDB vector index dimensions
  • text-embedding-3-small = 1536 dimensions
  • text-embedding-3-large = 3072 dimensions
  • Update index or change model to match
  • OpenRouter enforces rate limits per model
  • Implement exponential backoff in your code
  • Reduce concurrent requests
  • Contact OpenRouter for higher limits

Security best practices

Follow these security guidelines to protect your API key and data.
  • Never commit API keys: Use .env files and add to .gitignore
  • Use environment variables: Never hardcode keys in source code
  • Rotate keys regularly: Generate new keys periodically
  • Monitor usage: Set up alerts for unusual activity
  • Restrict key access: Use separate keys for development and production
  • Sanitize user input: Always validate and sanitize user queries before sending to the API

Next steps

RAG system

Learn how the RAG system works

Document management

Upload and manage documents

Build docs developers (and LLMs) love