Skip to main content

Environment Variables

Finance Agent uses environment variables for all configuration. Copy .env.example to .env and configure:

Required Variables

# OpenAI API Key (required for embeddings and LLM)
OPENAI_API_KEY=sk-your-openai-api-key-here

# API Ninjas Key (required for downloading earnings transcripts)
API_NINJAS_KEY=your-api-ninjas-key-here

Optional Variables

# Cerebras API Key (optional - for fast inference)
CEREBRAS_API_KEY=your-cerebras-api-key-here

# Which LLM to use: openai | cerebras | auto
# auto = Cerebras if CEREBRAS_API_KEY set, else OpenAI
RAG_LLM_PROVIDER=cerebras

# Max completion tokens for OpenAI (lower = faster; default 8000)
RAG_OPENAI_MAX_TOKENS=8000

Application Settings

# Environment: 'development' for local, 'production' for deployed
ENVIRONMENT=development

# Server configuration
PORT=8000
HOST=0.0.0.0
BASE_URL=http://localhost:8000

# Logging
LOG_LEVEL=INFO

RAG Agent Configuration

The RAG system is configured in agent/rag/config.py:

Core RAG Settings

{
    # Chunking parameters
    "chunk_size": 1000,           # Characters per chunk
    "chunk_overlap": 200,         # Overlap between chunks
    "chunks_per_quarter": 15,     # Number of chunks to retrieve per quarter
    
    # Similarity threshold for vector search
    "similarity_threshold": 0,    # 0 = include all results (ranking handles quality)
    
    # Embedding model
    "embedding_model": "all-MiniLM-L6-v2",  # 384-dimensional embeddings
}

LLM Configuration

{
    "openai_model": "gpt-5-nano-2025-08-07",
    "openai_max_tokens": 8000,  # Lower = faster responses
    "openai_temperature": 1,
}

Iterative Improvement Settings

{
    # Standard RAG iterations
    "max_iterations": 3,
    
    # SEC/10-K queries (more complex, need more iterations)
    "sec_max_iterations": 5,
}
These can be overridden via environment variables:
RAG_MAX_ITERATIONS=4
SEC_MAX_ITERATIONS=6

Hybrid Search Configuration

{
    "hybrid_search_enabled": True,
    "keyword_weight": 0.3,      # Weight for keyword (BM25) search
    "vector_weight": 0.7,       # Weight for vector (semantic) search
    "keyword_max_results": 10,  # Max results from keyword search
}
How hybrid search works:
1

Vector Search

Semantic similarity using cosine distance on embeddings (70% weight)
2

Keyword Search

BM25 full-text search using PostgreSQL’s ts_vector (30% weight)
3

Score Combination

Results are combined using weighted RRF (Reciprocal Rank Fusion)

Processing Limits

{
    # Maximum number of tickers to process in a single query
    "max_tickers": 8,
    
    # Maximum number of quarters to process (3 years of quarterly data)
    "max_quarters": 12,
}
Queries exceeding these limits are automatically capped with a user-friendly message explaining what was processed.

Answer Mode Configuration

Defined in agent/rag/config.py as ANSWER_MODE_CONFIG:
from enum import Enum

class AnswerMode(str, Enum):
    DIRECT = "direct"          # Simple factual lookup
    STANDARD = "standard"      # Moderate analysis
    DETAILED = "detailed"      # Full research report
    DEEP_SEARCH = "deep_search"  # Exhaustive search

ANSWER_MODE_CONFIG = {
    AnswerMode.DIRECT:      {"max_iterations": 2, "max_tokens": 2000, "confidence_threshold": 0.7},
    AnswerMode.STANDARD:    {"max_iterations": 3, "max_tokens": 6000, "confidence_threshold": 0.8},
    AnswerMode.DETAILED:    {"max_iterations": 4, "max_tokens": 16000, "confidence_threshold": 0.9},
    AnswerMode.DEEP_SEARCH: {"max_iterations": 10, "max_tokens": 20000, "confidence_threshold": 0.95},
}
The agent automatically selects the appropriate mode based on question complexity (analyzed by GPT-5-nano).

Agent Configuration

Advanced agentic behavior settings in agent/agent_config.py:
{
    # Iterative improvement
    "max_iterations": 4,
    "sec_max_iterations": 5,
    
    # Quality thresholds (very strict - favor iteration)
    "min_confidence_threshold": 0.90,
    "min_completeness_threshold": 0.90,
    
    # Evaluation model for quality assessment
    "evaluation_model": "gpt-4.1-mini-2025-04-14",
    "evaluation_temperature": 0.05,
    "evaluation_max_tokens": 1500,
    
    # Quality thresholds for iteration decisions
    "excellent_threshold": 0.95,  # Near perfect to stop early
    "good_threshold": 0.80,
    "poor_threshold": 0.60,
    
    # Search expansion during iterations
    "max_new_chunks_per_iteration": 5,
    "similarity_threshold_expansion": 0.25,
    
    # Iteration strategy
    "aggressive_iteration": True,
    "prefer_max_iterations": True,
}
The agent uses all available iterations by default to ensure comprehensive answers. Early stopping only occurs if confidence exceeds 0.95.

Database Configuration

PostgreSQL connection pool settings in config.py:
{
    "PRODUCTION_MIN_SIZE": 5,
    "PRODUCTION_MAX_SIZE": 30,
    "PRODUCTION_COMMAND_TIMEOUT": 20,
    "PRODUCTION_TIMEOUT": 15,
}

Database Timeouts

{
    "STATEMENT_TIMEOUT_MS": 30000,           # 30 seconds
    "IDLE_IN_TRANSACTION_TIMEOUT_MS": 60000, # 1 minute
}

Rate Limiting

{
    "PER_MINUTE": 30,
    "PER_MONTH": 10000,
    "ADMIN_PER_MONTH": 100000,
    "COST_PER_REQUEST": 0.02,  # $0.02 per request
}

Debug Mode

Enable query optimization logging:
RAG_DEBUG_MODE=true
When enabled, the system logs EXPLAIN ANALYZE output for database queries to help optimize performance.

Conversation Memory

Configured in agent/rag/question_analyzer.py:
{
    "max_exchanges": 5,           # Last 5 conversation turns
    "max_chars_per_message": 4000 # Max 4000 chars per message
}
This sliding window maintains recent context for follow-up questions while controlling memory usage.

Data Paths (Optional)

Override default data directories:
DATA_DIR=./data
TRANSCRIPTS_DIR=./data/transcripts
EMBEDDINGS_DIR=./data/embeddings
10K_FILINGS_DIR=./data/10k_filings
DUCKDB_PATH=./data/duckdb/financial_data_new.duckdb

External Services

# Optional - for WebSocket session management
REDIS_URL=redis://localhost:6379

Configuration Best Practices

1

Start with Defaults

The default configuration works well for most use cases. Only customize if you have specific requirements.
2

LLM Provider Selection

  • Cerebras: Fastest inference (recommended for production)
  • OpenAI: Higher quality for complex reasoning
  • Auto: Use Cerebras if available, fallback to OpenAI
3

Iteration Tuning

  • More iterations = better answers but slower
  • Start with defaults (3-4), increase for complex queries
  • Use DEEP_SEARCH mode for exhaustive research
4

Database Pooling

  • Increase pool size for high concurrency
  • Monitor connection usage and adjust timeouts
5

Hybrid Search Weights

  • 70/30 (vector/keyword) works well for most queries
  • Increase keyword weight for exact match queries
  • Increase vector weight for conceptual queries

Monitoring Configuration

Check configuration at runtime:
from agent.rag.config import Config
from config import settings

config = Config()
print(f"LLM Provider: {config.get('llm_provider')}")
print(f"Max Iterations: {config.get('max_iterations')}")
print(f"Hybrid Search: {config.get('hybrid_search_enabled')}")
print(f"Environment: {settings.ENVIRONMENT.ENVIRONMENT}")

Next Steps

Build docs developers (and LLMs) love