What is RAG?
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language models by:- Retrieving relevant information from a knowledge base
- Augmenting the LLM prompt with retrieved context
- Generating responses grounded in actual data
Factual Accuracy
Responses are grounded in real financial data, not LLM hallucinations
Citations
Every claim can be traced back to source documents
Up-to-date
Knowledge base can be updated without retraining the model
Domain-Specific
Specialized for financial analysis with structured data
Hybrid Search Strategy
Finance Agent uses hybrid search that combines two complementary approaches:Vector Search (70% weight)
Semantic similarity using embeddings:- Model:
all-MiniLM-L6-v2(384 dimensions) - Database: PostgreSQL with pgvector extension
- Similarity: Cosine distance between query and document embeddings
- Advantages: Understands meaning, handles synonyms, works with natural language
Keyword Search (30% weight)
Traditional text matching using TF-IDF:- Method: PostgreSQL full-text search with
ts_rank - Preprocessing: Extract keywords, build search vectors
- Advantages: Exact phrase matching, handles technical terms, fast execution
Why Hybrid?
Combining both approaches provides the best of both worlds:Vector Search Strengths
Vector Search Strengths
- Understands “capex” and “capital expenditures” are related
- Handles questions phrased differently than source text
- Captures semantic meaning beyond exact words
Keyword Search Strengths
Keyword Search Strengths
- Finds exact numbers and technical terms (“$2.5B”, “EBITDA”)
- Better for precise phrase matching
- Faster execution on large datasets
Combined Power
Combined Power
- Higher recall: Finds more relevant chunks
- Better precision: Ranks most relevant chunks first
- Robust to different query styles
Configuration: The 70/30 split is configurable in
rag/config.py:Database Schema
The RAG system uses PostgreSQL with the pgvector extension:Chunking strategy: Text is split into 1000-character chunks with 200-character overlap to ensure context continuity across chunk boundaries.
Parallel Retrieval
For performance, Finance Agent executes searches in parallel:Multi-Ticker Parallelization
When comparing multiple companies:Multi-Quarter Parallelization
For time-range queries:Parallel execution uses
ThreadPoolExecutor with up to 10 workers for maximum throughput.Iterative Improvement Loop
The agent doesn’t just generate one answer and stop. It performs iterative self-improvement until quality thresholds are met:Evaluation Metrics
The agent evaluates each answer on four dimensions (0-100 scale):| Metric | What It Measures |
|---|---|
| Completeness | Does the answer fully address the question? |
| Specificity | Does it include specific numbers, quotes, dates? |
| Accuracy | Is the information factually correct? |
| Clarity | Is the response well-structured and readable? |
Iteration Actions
During iteration, the agent can:Generate Follow-up Keywords
Create search-optimized keyword phrases (NOT verbose questions) for missing information.OLD Approach (verbose questions):NEW Approach (search-optimized keywords):
Search All Quarters
Each keyword phrase searches ALL target quarters in parallel for comprehensive coverage.
Stop Conditions
Iteration stops when any of these conditions are met:- Confidence ≥ threshold (varies by answer mode: 70-95%)
- Max iterations reached (2-10 depending on answer mode)
- Agent decides answer is sufficient (explicit satisfaction signal)
- No follow-up keyword phrases generated (nothing left to search)
The agent automatically adjusts iteration depth based on question complexity (answer mode: direct/standard/detailed).
Answer Generation
Once sufficient context is retrieved, the agent generates answers using:Single-Ticker Responses
For questions about one company:Multi-Ticker Synthesis
For comparative questions:Citation System
Every claim is backed by citations:- Transcript citations:
[1],[2],[3] - 10-K citations:
[10K1],[10K2],[10K3] - News citations:
[N1],[N2],[N3]
- Source type (earnings call, 10-K filing, news article)
- Company ticker
- Time period (Q1 2025, FY 2024, date)
- URL or document reference
Configuration
The RAG pipeline is highly configurable:Performance Optimizations
Parallel Execution
Parallel Execution
Vector and keyword searches run in parallel using
ThreadPoolExecutor for 2x speedup.Async Embedding
Async Embedding
Query embeddings are generated asynchronously to avoid blocking the event loop:
Database Connection Pooling
Database Connection Pooling
Reuses database connections across requests to avoid connection overhead.
Chunk Deduplication
Chunk Deduplication
Removes duplicate chunks across multiple searches to reduce context size.
Smart Iteration
Smart Iteration
Stops early when confidence thresholds are met, avoiding unnecessary LLM calls.
Next Steps
Data Sources
Learn about the three specialized data source tools
Architecture
Understand the complete six-stage pipeline