What is GraphRAG?
GraphRAG is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using the power of Large Language Models (LLMs). Built by Microsoft Research, it represents a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets.GraphRAG is an open-source project from Microsoft Research. Read the research paper and blog post to learn more about the methodology.
How GraphRAG works
The GraphRAG process involves four key steps:Extract knowledge graph
Extract entities, relationships, and key claims from your raw text documents to build a comprehensive knowledge graph.
Build community hierarchy
Use the Leiden algorithm to perform hierarchical clustering, organizing entities into meaningful semantic communities.
Generate summaries
Create bottom-up summaries for each community and its constituents to enable holistic understanding of your dataset.
GraphRAG vs baseline RAG
While traditional vector-based RAG (baseline RAG) uses semantic similarity search over text chunks, it struggles in two key scenarios:Connecting the dots
Baseline RAG fails when answers require traversing disparate pieces of information through their shared attributes to synthesize new insights.
Holistic understanding
Baseline RAG performs poorly when asked to understand summarized semantic concepts over large data collections or singular large documents.
Key capabilities
Global search
Reason about holistic questions by leveraging community summaries across your entire dataset.
Local search
Answer specific questions about entities by exploring their neighbors and associated concepts.
DRIFT search
Enhanced local search that includes community information for broader, more comprehensive answers.
Prompt tuning
Fine-tune extraction prompts to optimize GraphRAG performance for your specific data.
The GraphRAG indexing pipeline
GraphRAG’s indexing engine transforms your raw documents through a series of workflows:Query modes
At query time, GraphRAG provides multiple search modes optimized for different question types:- Global Search - Best for questions requiring understanding of the entire dataset (e.g., “What are the top themes?”)
- Local Search - Best for questions about specific entities (e.g., “What are the healing properties of chamomile?”)
- DRIFT Search - Enhanced local search with community context for comprehensive entity-based queries
- Basic Search - Standard vector RAG for comparison and baseline queries
Use cases
GraphRAG excels at reasoning about private datasets - data that the LLM has never seen before:- Enterprise research documents
- Business intelligence and reporting
- Legal document analysis
- Scientific literature review
- Customer feedback analysis
- Knowledge base construction
Getting started
Quickstart
Get up and running with GraphRAG in minutes using our command-line quickstart guide.
Installation
Detailed installation instructions and system requirements for all supported platforms.
Configuration
Learn how to configure GraphRAG for your specific data and use case.
Architecture highlights
GraphRAG is built with extensibility and customization in mind:- Factory pattern - Register custom implementations for models, storage, vector stores, and workflows
- Provider support - Built-in support for OpenAI, Azure OpenAI, and 100+ models via LiteLLM
- Storage flexibility - File, blob storage, and CosmosDB support out of the box
- Vector store options - LanceDB, Azure AI Search, and CosmosDB with extensible interface
Community and support
GitHub Discussions
Join the conversation and get help from the community.
GitHub Issues
Report bugs and request features.
GraphRAG is provided as demonstration code and is not an officially supported Microsoft product. Always review the Responsible AI FAQ before deployment.