Self-contained building blocks for routing, query enhancement, and context compression
The components directory contains reusable, self-contained LangChain building blocks that implement specific retrieval and generation sub-tasks. Feature pipelines compose these components rather than reimplementing the same logic.
The router implements a three-state state machine:
search: Retrieve more documents from the vector store. Selected when no documents have been retrieved yet or when reflection identified information gaps.
reflect: Evaluate and improve the current answer. Selected when documents exist but the answer’s quality, completeness, or grounding is uncertain.
generate: Produce the final answer. Selected when sufficient information has been gathered or when max_iterations is reached (forced fallback).
import jsonfrom langchain_core.prompts import PromptTemplatefrom langchain_groq import ChatGroqclass AgenticRouter: """Route queries to search, reflect, or generate actions using LLM reasoning. The router implements an agentic decision-making pattern where an LLM evaluates the current pipeline state and determines the optimal next action. """ ROUTING_TEMPLATE = """You are a query routing agent. Given a query and optional current answer, decide what action to take next.Current State:- Query: {query}- Has Retrieved Documents: {has_documents}- Current Answer: {current_answer}- Iteration: {iteration}/{max_iterations}Your task is to decide ONE of the following actions:1. 'search': Retrieve documents from vector database (choose this if you need more information)2. 'reflect': Verify and improve the current answer (choose this to validate answer quality)3. 'generate': Create final answer (choose this when you have enough information)Return a JSON object with this exact format:{{"action": "search|reflect|generate", "reasoning": "brief explanation"}}Do NOT include any other text. Return ONLY the JSON object.""" def __init__(self, llm: ChatGroq) -> None: """Initialize AgenticRouter with a LangChain LLM instance. Args: llm: ChatGroq instance for routing decisions. Should be configured with low temperature (0.0-0.3) for consistent routing. """ self.llm = llm def route( self, query: str, has_documents: bool = False, current_answer: str | None = None, iteration: int = 1, max_iterations: int = 3, ) -> dict[str, Any]: """Route a query to the appropriate action. Args: query: The user's original query text. has_documents: Whether documents have been retrieved. current_answer: The answer generated so far, if any. iteration: Current iteration number (1-indexed). max_iterations: Maximum iterations allowed. Returns: Dictionary with 'action' and 'reasoning' keys. Raises: ValueError: If LLM response is invalid JSON or contains invalid action. """ # Enforce iteration limit as safety mechanism if iteration >= max_iterations: return { "action": "generate", "reasoning": f"Reached maximum iterations ({max_iterations})", } answer_str = current_answer if current_answer else "No answer yet" # Construct prompt using LangChain's PromptTemplate prompt = PromptTemplate( template=self.ROUTING_TEMPLATE, input_variables=[ "query", "has_documents", "current_answer", "iteration", "max_iterations", ], ) formatted_prompt = prompt.format( query=query, has_documents=has_documents, current_answer=answer_str, iteration=iteration, max_iterations=max_iterations, ) # Invoke LLM to get routing decision response = self.llm.invoke(formatted_prompt) response_text = response.content.strip() # Parse JSON response try: decision = json.loads(response_text) except json.JSONDecodeError as e: raise ValueError(f"Invalid JSON from router: {response_text}") from e # Validate response structure if "action" not in decision or "reasoning" not in decision: raise ValueError(f"Router response missing required fields: {decision}") # Validate action value action = decision["action"].lower().strip() if action not in ("search", "reflect", "generate"): raise ValueError( f"Invalid action: {action}. Must be 'search', 'reflect', or 'generate'" ) return { "action": action, "reasoning": decision["reasoning"], }
Generates 5 alternative phrasings of the original query. Returns a list of up to 5 query strings. The original query is NOT included.Best for: Simple factual queries where different phrasings might match different documents.
HyDE (Hypothetical Document Embeddings)
Generates a hypothetical 2-3 sentence document answer. Returns [original_query, hypothetical_answer].Best for: Very short queries or when query/document distributions differ significantly.
from langchain_core.prompts import PromptTemplatefrom langchain_groq import ChatGroqclass QueryEnhancer: """Generate multiple query perspectives for enhanced retrieval. Implements three complementary strategies: multi-query generation, HyDE, and step-back prompting. """ MULTI_QUERY_TEMPLATE = """You are an AI language model assistant. Your task is to generate 5 different search queries that would help answer the given question. Provide only the queries, one per line, without numbering or bullet points.Original question: {query}Alternative queries:""" HYDE_TEMPLATE = """You are an AI language model assistant. Your task is to generate a hypothetical document that would answer the given question. Write a brief, focused response (2-3 sentences) that directly answers the question.Question: {query}Hypothetical document:""" STEP_BACK_TEMPLATE = """You are an AI language model assistant. Your task is to generate 3 step-back questions that would provide broader context for answering the given question. These are more general, foundational questions. Provide only the questions, one per line, without numbering.Original question: {query}Step-back questions:""" def __init__(self, llm: ChatGroq) -> None: """Initialize QueryEnhancer with a LangChain LLM instance. Args: llm: ChatGroq instance for query generation. Recommended temperature: 0.3-0.7. """ self.llm = llm def generate_multi_queries(self, query: str) -> list[str]: """Generate alternative query formulations. Args: query: The original user query text. Returns: List of alternative query strings (up to 5). Original query NOT included. """ prompt = PromptTemplate( template=self.MULTI_QUERY_TEMPLATE, input_variables=["query"], ) formatted_prompt = prompt.format(query=query) response = self.llm.invoke(formatted_prompt) queries = response.content.strip().split("\n") queries = [q.strip() for q in queries if q.strip()] return queries[:5] def generate_hyde_queries(self, query: str) -> list[str]: """Generate hypothetical document for HyDE-based retrieval. Args: query: The original user query text. Returns: List containing [original_query, hypothetical_answer]. """ prompt = PromptTemplate( template=self.HYDE_TEMPLATE, input_variables=["query"], ) formatted_prompt = prompt.format(query=query) response = self.llm.invoke(formatted_prompt) hyde_response = response.content.strip() return [query, hyde_response] def generate_step_back_queries(self, query: str) -> list[str]: """Generate step-back questions for broader context retrieval. Args: query: The original user query text. Returns: List of [step_back_q1, step_back_q2, step_back_q3, original_query]. """ prompt = PromptTemplate( template=self.STEP_BACK_TEMPLATE, input_variables=["query"], ) formatted_prompt = prompt.format(query=query) response = self.llm.invoke(formatted_prompt) step_back_queries = response.content.strip().split("\n") step_back_queries = [q.strip() for q in step_back_queries if q.strip()] return step_back_queries[:3] + [query] def generate_queries(self, query: str, mode: str = "multi_query") -> list[str]: """Generate enhanced queries based on the specified mode. Args: query: The original user query text. mode: Enhancement mode ('multi_query', 'hyde', or 'step_back'). Returns: List of enhanced query strings. Raises: ValueError: If mode is not recognized. """ if mode == "multi_query": return self.generate_multi_queries(query) if mode == "hyde": return self.generate_hyde_queries(query) if mode == "step_back": return self.generate_step_back_queries(query) raise ValueError( f"Unknown mode: {mode}. Must be 'multi_query', 'hyde', or 'step_back'" )
Over-composing before baseline validation: Build and validate the simplest pipeline first, then add components.
Inconsistent LLM temperature: Use low temperature (0.0) for routing decisions and higher (0.3-0.7) for creative tasks like query generation.
Not logging routing decisions: All components log at INFO level. Set LOG_LEVEL=DEBUG to see full prompts and responses for debugging routing and compression behavior.