Skip to main content

Overview

The agent executes a 6-stage pipeline for each question, with strategic parallelization and semantic routing to optimize performance and accuracy.
┌─────────────────────────────────────────────────────────────────────────┐
│                    COMPLETE PIPELINE FLOW                                │
└─────────────────────────────────────────────────────────────────────────┘

Stage 1: Setup & Initialization

1

Initialize RAG components

  • Load search engine (hybrid vector + keyword)
  • Initialize response generator
  • Connect to vector database (pgvector)
2

Load configuration

  • Answer mode thresholds
  • LLM provider settings (Cerebras/OpenAI)
  • Hybrid search weights (70% semantic, 30% keyword)
3

Fetch available quarters

  • Query database for available transcript quarters
  • Per-company quarter availability (not global)
# Internal initialization
def __init__(self):
    self.search_engine = SearchEngine()  # Hybrid search
    self.response_generator = ResponseGenerator()
    self.sec_service = SECFilingsService()  # 10-K agent
    self.tavily_service = TavilyService()  # News

Stage 2: Combined Reasoning + Analysis

Single LLM call via ReasoningPlanner that performs comprehensive question understanding.

Extracted Information

  • Tickers - Company identifiers ($AAPL, $MSFT)
  • Time references - Temporal phrases preserved exactly (“Q4 2024”, “last 3 quarters”, “latest”)
  • Intent - What is the user trying to learn?
  • Topic - Main subject (e.g., “cloud revenue growth”)
  • Question type - Single company, multiple companies, or comparison
  • Answer mode - direct | standard | detailed
  • Validation - Reject off-topic/invalid questions

Semantic Data Source Routing

Routes based on intent, not keywords:
{
  "data_sources": ["earnings_transcripts"],  // or "10k", "news", "hybrid"
  "needs_latest_news": false,
  "needs_10k": false
}

Research Reasoning

Generates 2-3 sentence research approach:
{
  "reasoning": "The user is asking about Microsoft's cloud business strategy and Azure performance. I need to find Azure revenue figures and growth rates (quarterly), management commentary on competitive positioning vs AWS/Google Cloud, margin trends and profitability metrics, and forward guidance for cloud segment."
}
This reasoning:
  • Makes agent thinking transparent
  • Guides evaluation (did we find what we planned to find?)
  • Improves answer quality through structured research

Implementation Reference

# From agent/rag/reasoning_planner.py (ReasoningPlanner)
analysis = {
    "reasoning": "2-3 sentence research approach",
    "tickers": ["AAPL", "MSFT"],
    "time_refs": ["last 3 quarters"],
    "topic": "cloud revenue growth",
    "question_type": "multiple_companies",
    "data_sources": ["earnings_transcripts", "news"],
    "answer_mode": "standard",
    "is_valid": true,
    "confidence": 0.95
}
This single LLM call replaces what used to be multiple sequential calls, significantly reducing latency.

Stage 2.1: Search Planning

SearchPlanner converts temporal references into concrete search plans.
Each company gets its own most recent quarters (not global):
# Database query per company
SELECT DISTINCT year, quarter 
FROM transcript_chunks 
WHERE ticker = %s 
ORDER BY year DESC, quarter DESC
Examples:
  • "latest"get_last_n_quarters_for_company(ticker, 1)
  • "last 3 quarters"get_last_n_quarters_for_company(ticker, 3)
  • "Q4 2024" → Specific quarter validation
Companies have different fiscal year calendars. Apple’s Q4 2024 may not align with Microsoft’s Q4 2024.
Builds search plan for each data source:
{
  "search_plan": {
    "earnings_transcripts": [
      {
        "ticker": "AAPL",
        "quarters": ["2024_q4", "2025_q1"],
        "query": "iPhone sales revenue"
      }
    ],
    "10k": [
      {
        "ticker": "AAPL",
        "fiscal_year": 2024
      }
    ]
  },
  "reasoning": "Searching last 2 quarters for iPhone sales discussion"
}
Conditional execution: Only if needs_latest_news=true
1

Query Tavily API

Real-time web search for current events
class TavilyService:
    def search_news(self, query: str, max_results: int = 5):
        # Returns AI-generated summary + articles
2

Format with citations

Uses [N1], [N2] citation markers
def format_news_context(self, news_results):
    """Formats with [N1], [N2] citation markers"""
3

Stream to frontend

Event type: news_search

Stage 2.6: SEC 10-K Retrieval Agent

Conditional execution: Only if data_source="10k" or needs_10k=true Invokes specialized retrieval agent for SEC 10-K annual filings.
See SEC Agent for complete documentation of this stage.

Key Features

Planning-Driven

Generates targeted sub-questions for retrieval

Section Routing

LLM-based routing to Item 1, Item 7, Item 8, etc.

Table Selection

LLM selects relevant tables from financial statements

Iterative Retrieval

Up to 5 iterations with self-evaluation

Flow Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                         10-K SEARCH FLOW (max 5 iterations)                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────┐                                                        │
│  │ PHASE 0: PLAN   │   Generate sub-questions + search plan                │
│  │ • Sub-questions │   "What is inventory turnover?" →                     │
│  │ • Search plan   │     - "What is COGS?" [TABLE]                         │
│  └────────┬────────┘     - "What is inventory?" [TABLE]                    │
│           │              - "Inventory valuation?" [TEXT]                   │
│           ▼                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ PHASE 1: PARALLEL RETRIEVAL                                         │   │
│  │ ├── Execute ALL searches in parallel (6 workers)                    │   │
│  │ │   ├── TABLE: "cost of goods sold" → LLM selects tables            │   │
│  │ │   ├── TABLE: "inventory balance" → LLM selects tables             │   │
│  │ │   └── TEXT: "inventory valuation" → hybrid search                 │   │
│  │ └── Deduplicate and combine chunks                                  │   │
│  └────────┬────────────────────────────────────────────────────────────┘   │
│           │                                                                 │
│           ▼                                                                 │
│  ┌─────────────────┐                                                        │
│  │ PHASE 2: ANSWER │   Generate answer with ALL retrieved chunks          │
│  └────────┬────────┘                                                        │
│           │                                                                 │
│           ▼                                                                 │
│  ┌─────────────────┐                                                        │
│  │ PHASE 3: EVAL   │   If quality >= 90% → DONE                            │
│  │                 │   Else → Replan and loop back                         │
│  └─────────────────┘                                                        │
└─────────────────────────────────────────────────────────────────────────────┘

Citation Format

Results formatted with [10K1], [10K2] citation markers for source attribution. Hybrid vector + keyword search over earnings call transcripts.
Direct search with quarter filtering:
def search_similar_chunks(query, top_k=15, quarter="2024_q4"):
    # Hybrid search combining:
    # - Vector search: 70% weight (semantic similarity via pgvector)
    # - Keyword search: 30% weight (TF-IDF)
Scoring:
final_score = (0.7 × semantic_similarity) + (0.3 × keyword_match)

Database Query

SELECT chunk_text, ticker, year, quarter,
       1 - (embedding <=> query_embedding) AS similarity
FROM transcript_chunks
WHERE ticker = %s AND quarter = %s
ORDER BY similarity DESC
LIMIT 15;

Stage 4: Initial Answer Generation

def generate_openai_response(
    question: str,
    chunks: List[str],
    reasoning: str,
    model: str
):
    """
    Generates answer with:
    - Specific numbers and quotes
    - Citation markers [1], [2]
    - Period metadata (Q1 2025, FY 2024)
    """
Prompt includes:
  • Original question
  • Research reasoning from Stage 2
  • All retrieved chunks
  • Citation instructions

Stage 5: Iterative Improvement

Self-reflection loop with configurable depth based on answer mode.
┌─────────────────────────────────────────────────────────────────┐
│                    ITERATION LOOP                                │
│                                                                  │
│  ┌──────────────────┐                                           │
│  │ Generate Answer  │◄──────────────────────────────────┐       │
│  └────────┬─────────┘                                   │       │
│           │                                             │       │
│           ▼                                             │       │
│  ┌──────────────────┐                                   │       │
│  │ Evaluate Quality │                                   │       │
│  │ • completeness   │                                   │       │
│  │ • specificity    │                                   │       │
│  │ • accuracy       │                                   │       │
│  │ • vs. reasoning  │ ← Checks if reasoning goals met   │       │
│  └────────┬─────────┘                                   │       │
│           │                                             │       │
│           ▼                                             │       │
│  ┌──────────────────┐    YES    ┌─────────────────┐    │       │
│  │ Confidence < 90% │─────────► │ Search for more │────┘       │
│  │ & iterations left│           │ context (tools) │            │
│  └────────┬─────────┘           └─────────────────┘            │
│           │ NO                                                  │
│           ▼                                                     │
│     ┌───────────┐                                               │
│     │  OUTPUT   │                                               │
│     └───────────┘                                               │
└─────────────────────────────────────────────────────────────────┘

Evaluation Metrics

Scores (0-100 scale):
completeness_score
integer
required
Does the answer fully address the question?
specificity_score
integer
required
Does it include specific numbers, quotes, and details?
accuracy_score
integer
required
Is the information factually correct based on sources?
clarity_score
integer
required
Is the response well-structured and easy to understand?
overall_confidence
float
required
Weighted combination (0-1 scale)

Follow-Up Actions

During iteration, the agent can:

Generate Keyword Phrases

Search-optimized keywords (NOT verbose questions)Example: "capex guidance 2025 AI allocation"Not: "What guidance did they provide for capex..."

Request Transcript Search

needs_transcript_search: trueSearches ALL target quarters in parallel

Request News Search

needs_news_search: trueFetches real-time news updates

Evaluate Progress

Check if reasoning goals are met

Termination Conditions

1

Confidence threshold met

overall_confidence >= threshold (varies by answer mode: 70-95%)
2

Max iterations reached

2-10 depending on answer mode
3

Agent satisfaction

Agent decides answer is sufficient
4

No follow-ups

No additional keyword phrases generated

Answer Mode Configuration

ModeMax IterationsConfidence ThresholdUse Case
direct270%“What was Q4 revenue?”
standard380%“Explain cloud strategy”
detailed490%“Analyze margin trends”
deep_search1095%Reserved for future use

Stage 6: Final Response Assembly

1

Stream final answer

Event type: resultIncludes complete answer with all citations
2

Include source attributions

  • Transcript citations: [1], [2]
  • 10-K citations: [10K1], [10K2]
  • News citations: [N1], [N2]
3

Return metadata

{
  "confidence": 0.92,
  "chunks_used": 28,
  "iterations": 2,
  "timing": {
    "reasoning": 1.2,
    "retrieval": 3.5,
    "generation": 2.1,
    "total": 6.8
  },
  "sources": {
    "earnings_transcripts": 15,
    "10k": 8,
    "news": 5
  }
}

Performance Optimization

Multiple independent operations run concurrently:
  • Multi-ticker searches (one per company)
  • 10-K sub-question searches (6 workers)
  • Quarter searches (all target quarters)
  • Follow-up keyword phrase searches
  • Embedding cache for frequent queries
  • Quarter availability cache (30 min TTL)
  • LLM response caching for identical questions
  • Stop iteration when confidence ≥ threshold
  • 10-K agent stops at 90% quality (avg 2.4 iterations vs max 5)
  • Avoid unnecessary searches when answer is complete
  • Deduplicate chunks by citation marker
  • Avoid retrieving same content multiple times
  • Merge overlapping context windows

Next Steps

Iterative Improvement

Deep dive into self-reflection and evaluation

SEC Agent

Learn about the specialized 10-K retrieval agent

Build docs developers (and LLMs) love