Skip to main content
This tutorial will walk you through the complete GraphRAG workflow, from installation to querying your indexed data. You’ll learn how to set up GraphRAG, index a sample dataset, and perform both global and local searches.
GraphRAG can consume significant LLM resources. Start with the tutorial dataset and inexpensive models before scaling up to larger datasets.

Prerequisites

Before you begin, ensure you have:
  • Python 3.10, 3.11, or 3.12 installed
  • An OpenAI API key or Azure OpenAI credentials
  • Basic familiarity with command line operations

Installation

1

Create a project directory

Create a new directory for your GraphRAG project and navigate to it:
mkdir graphrag_tutorial
cd graphrag_tutorial
2

Set up virtual environment

Create and activate a Python virtual environment:
python -m venv .venv
source .venv/bin/activate
3

Install GraphRAG

Install the GraphRAG package using pip:
python -m pip install graphrag

Initialize your workspace

1

Run initialization

Initialize your GraphRAG workspace:
graphrag init
When prompted, specify your preferred chat and embedding models. This command creates:
  • .env - Environment variables file
  • settings.yaml - Pipeline configuration
  • input/ - Directory for your source documents
2

Configure API credentials

Open the .env file and add your API key:
GRAPHRAG_API_KEY=sk-your-openai-api-key-here
3

Configure Azure settings (if applicable)

If using Azure OpenAI, edit settings.yaml to add Azure-specific configuration:
completion_models:
  default_completion_model:
    type: chat
    model_provider: azure
    model: gpt-4.1
    deployment_name: <AZURE_DEPLOYMENT_NAME>
    api_base: https://<instance>.openai.azure.com
    api_version: 2024-02-15-preview

embedding_models:
  default_embedding_model:
    type: embedding
    model_provider: azure
    model: text-embedding-3-small
    deployment_name: <AZURE_EMBEDDING_DEPLOYMENT_NAME>
    api_base: https://<instance>.openai.azure.com
    api_version: 2024-02-15-preview

Add sample data

1

Download sample text

Download a sample text file to process. We’ll use “A Christmas Carol” by Charles Dickens:
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./input/book.txt
2

Verify the input

Confirm the file was downloaded successfully:
ls -lh input/
You should see book.txt in the input directory.

Run the indexing pipeline

1

Start indexing

Execute the indexing pipeline to process your documents:
graphrag index
This process will:
  • Extract entities and relationships from your text
  • Build a knowledge graph structure
  • Generate community summaries
  • Create embeddings for semantic search
The indexing process typically takes several minutes depending on document size and API rate limits. Progress will be displayed in your terminal.
2

Review output

After completion, check the ./output directory for generated parquet files:
ls -lh output/
Key output files include:
  • entities.parquet - Extracted entities
  • relationships.parquet - Entity relationships
  • communities.parquet - Detected communities
  • community_reports.parquet - Community summaries
  • text_units.parquet - Chunked text segments

Query your data

Now that your data is indexed, you can query it using two different search methods. Global search answers high-level questions by analyzing community reports across the entire dataset.
1

Run a global query

Ask a broad question about the entire dataset:
graphrag query "What are the main themes in this story?"
Global search is ideal for questions like:
  • “What are the top themes?”
  • “What is the overall narrative?”
  • “What are the key events?”
Local search answers specific questions by combining knowledge graph data with relevant text chunks.
1

Run a local query

Ask a specific question about entities or details:
graphrag query \
  "Who is Scrooge and what are his main relationships?" \
  --method local
Local search is ideal for questions like:
  • “Who is [character] and what do they do?”
  • “What is the relationship between X and Y?”
  • “What are the properties of [entity]?”

Understanding the results

The main output is the AI-generated answer to your query, synthesized from the indexed knowledge graph.

Next steps

Custom prompts

Learn how to customize prompts for better domain-specific results

Azure deployment

Deploy GraphRAG with Azure OpenAI and Azure Storage

Configuration

Explore advanced configuration options

Query engine

Deep dive into search methods and parameters

Troubleshooting

If you encounter rate limit errors:
  • Reduce the number of concurrent requests in settings.yaml
  • Add rate limiting configuration
  • Use a higher-tier API plan
For large datasets:
  • Reduce chunk size in the chunking configuration
  • Process documents in smaller batches
  • Increase system memory allocation
To improve query results:
  • Run prompt tuning to adapt prompts to your domain
  • Verify your input data is properly formatted
  • Adjust community detection parameters

Build docs developers (and LLMs) love