Environment Variables
Finance Agent uses environment variables for all configuration. Copy .env.example to .env and configure:
Required Variables
# OpenAI API Key (required for embeddings and LLM)
OPENAI_API_KEY = sk-your-openai-api-key-here
# API Ninjas Key (required for downloading earnings transcripts)
API_NINJAS_KEY = your-api-ninjas-key-here
Optional Variables
LLM Providers
Search & Research
S3 Storage
# Cerebras API Key (optional - for fast inference)
CEREBRAS_API_KEY = your-cerebras-api-key-here
# Which LLM to use: openai | cerebras | auto
# auto = Cerebras if CEREBRAS_API_KEY set, else OpenAI
RAG_LLM_PROVIDER = cerebras
# Max completion tokens for OpenAI (lower = faster; default 8000)
RAG_OPENAI_MAX_TOKENS = 8000
Application Settings
# Environment: 'development' for local, 'production' for deployed
ENVIRONMENT = development
# Server configuration
PORT = 8000
HOST = 0.0.0.0
BASE_URL = http://localhost:8000
# Logging
LOG_LEVEL = INFO
RAG Agent Configuration
The RAG system is configured in agent/rag/config.py:
Core RAG Settings
{
# Chunking parameters
"chunk_size" : 1000 , # Characters per chunk
"chunk_overlap" : 200 , # Overlap between chunks
"chunks_per_quarter" : 15 , # Number of chunks to retrieve per quarter
# Similarity threshold for vector search
"similarity_threshold" : 0 , # 0 = include all results (ranking handles quality)
# Embedding model
"embedding_model" : "all-MiniLM-L6-v2" , # 384-dimensional embeddings
}
LLM Configuration
OpenAI Settings
Cerebras Settings (Fast Inference)
Evaluation Model
{
"openai_model" : "gpt-5-nano-2025-08-07" ,
"openai_max_tokens" : 8000 , # Lower = faster responses
"openai_temperature" : 1 ,
}
Iterative Improvement Settings
{
# Standard RAG iterations
"max_iterations" : 3 ,
# SEC/10-K queries (more complex, need more iterations)
"sec_max_iterations" : 5 ,
}
These can be overridden via environment variables:
RAG_MAX_ITERATIONS = 4
SEC_MAX_ITERATIONS = 6
Hybrid Search Configuration
{
"hybrid_search_enabled" : True ,
"keyword_weight" : 0.3 , # Weight for keyword (BM25) search
"vector_weight" : 0.7 , # Weight for vector (semantic) search
"keyword_max_results" : 10 , # Max results from keyword search
}
How hybrid search works:
Vector Search
Semantic similarity using cosine distance on embeddings (70% weight)
Keyword Search
BM25 full-text search using PostgreSQL’s ts_vector (30% weight)
Score Combination
Results are combined using weighted RRF (Reciprocal Rank Fusion)
Processing Limits
{
# Maximum number of tickers to process in a single query
"max_tickers" : 8 ,
# Maximum number of quarters to process (3 years of quarterly data)
"max_quarters" : 12 ,
}
Queries exceeding these limits are automatically capped with a user-friendly message explaining what was processed.
Answer Mode Configuration
Defined in agent/rag/config.py as ANSWER_MODE_CONFIG:
from enum import Enum
class AnswerMode ( str , Enum ):
DIRECT = "direct" # Simple factual lookup
STANDARD = "standard" # Moderate analysis
DETAILED = "detailed" # Full research report
DEEP_SEARCH = "deep_search" # Exhaustive search
ANSWER_MODE_CONFIG = {
AnswerMode. DIRECT : { "max_iterations" : 2 , "max_tokens" : 2000 , "confidence_threshold" : 0.7 },
AnswerMode. STANDARD : { "max_iterations" : 3 , "max_tokens" : 6000 , "confidence_threshold" : 0.8 },
AnswerMode. DETAILED : { "max_iterations" : 4 , "max_tokens" : 16000 , "confidence_threshold" : 0.9 },
AnswerMode. DEEP_SEARCH : { "max_iterations" : 10 , "max_tokens" : 20000 , "confidence_threshold" : 0.95 },
}
The agent automatically selects the appropriate mode based on question complexity (analyzed by GPT-5-nano).
Agent Configuration
Advanced agentic behavior settings in agent/agent_config.py:
{
# Iterative improvement
"max_iterations" : 4 ,
"sec_max_iterations" : 5 ,
# Quality thresholds (very strict - favor iteration)
"min_confidence_threshold" : 0.90 ,
"min_completeness_threshold" : 0.90 ,
# Evaluation model for quality assessment
"evaluation_model" : "gpt-4.1-mini-2025-04-14" ,
"evaluation_temperature" : 0.05 ,
"evaluation_max_tokens" : 1500 ,
# Quality thresholds for iteration decisions
"excellent_threshold" : 0.95 , # Near perfect to stop early
"good_threshold" : 0.80 ,
"poor_threshold" : 0.60 ,
# Search expansion during iterations
"max_new_chunks_per_iteration" : 5 ,
"similarity_threshold_expansion" : 0.25 ,
# Iteration strategy
"aggressive_iteration" : True ,
"prefer_max_iterations" : True ,
}
The agent uses all available iterations by default to ensure comprehensive answers. Early stopping only occurs if confidence exceeds 0.95.
Database Configuration
PostgreSQL connection pool settings in config.py:
{
"PRODUCTION_MIN_SIZE" : 5 ,
"PRODUCTION_MAX_SIZE" : 30 ,
"PRODUCTION_COMMAND_TIMEOUT" : 20 ,
"PRODUCTION_TIMEOUT" : 15 ,
}
Database Timeouts
{
"STATEMENT_TIMEOUT_MS" : 30000 , # 30 seconds
"IDLE_IN_TRANSACTION_TIMEOUT_MS" : 60000 , # 1 minute
}
Rate Limiting
{
"PER_MINUTE" : 30 ,
"PER_MONTH" : 10000 ,
"ADMIN_PER_MONTH" : 100000 ,
"COST_PER_REQUEST" : 0.02 , # $0.02 per request
}
Debug Mode
Enable query optimization logging:
When enabled, the system logs EXPLAIN ANALYZE output for database queries to help optimize performance.
Conversation Memory
Configured in agent/rag/question_analyzer.py:
{
"max_exchanges" : 5 , # Last 5 conversation turns
"max_chars_per_message" : 4000 # Max 4000 chars per message
}
This sliding window maintains recent context for follow-up questions while controlling memory usage.
Data Paths (Optional)
Override default data directories:
DATA_DIR = ./data
TRANSCRIPTS_DIR = ./data/transcripts
EMBEDDINGS_DIR = ./data/embeddings
10K_FILINGS_DIR = ./data/10k_filings
DUCKDB_PATH = ./data/duckdb/financial_data_new.duckdb
External Services
Redis (Caching)
Logfire (Observability)
# Optional - for WebSocket session management
REDIS_URL = redis://localhost:6379
Configuration Best Practices
Start with Defaults
The default configuration works well for most use cases. Only customize if you have specific requirements.
LLM Provider Selection
Cerebras : Fastest inference (recommended for production)
OpenAI : Higher quality for complex reasoning
Auto : Use Cerebras if available, fallback to OpenAI
Iteration Tuning
More iterations = better answers but slower
Start with defaults (3-4), increase for complex queries
Use DEEP_SEARCH mode for exhaustive research
Database Pooling
Increase pool size for high concurrency
Monitor connection usage and adjust timeouts
Hybrid Search Weights
70/30 (vector/keyword) works well for most queries
Increase keyword weight for exact match queries
Increase vector weight for conceptual queries
Monitoring Configuration
Check configuration at runtime:
from agent.rag.config import Config
from config import settings
config = Config()
print ( f "LLM Provider: { config.get( 'llm_provider' ) } " )
print ( f "Max Iterations: { config.get( 'max_iterations' ) } " )
print ( f "Hybrid Search: { config.get( 'hybrid_search_enabled' ) } " )
print ( f "Environment: { settings. ENVIRONMENT . ENVIRONMENT } " )
Next Steps