Skip to main content
GraphRAG provides the ability to create domain-adapted prompts for the generation of the knowledge graph. This step is optional, though it is highly encouraged to run it as it will yield better results when executing an index run.
Auto tuning analyzes your input data and generates customized prompts that are specifically adapted to your domain, resulting in significantly better entity extraction and knowledge graph quality.
These prompts are generated by loading the inputs, splitting them into chunks (text units) and then running a series of LLM invocations and template substitutions to generate the final prompts. We suggest using the default values provided by the script, but in this page you’ll find the details of each parameter in case you want to further explore and tweak the prompt tuning algorithm.
Auto Tuning Conceptual Diagram

Prerequisites

Before running auto tuning, ensure you have already initialized your workspace with the graphrag init command. This will create the necessary configuration files and the default prompts.

Initialization

Learn how to initialize your GraphRAG workspace

Basic usage

You can run the auto tuning command with minimal configuration (recommended):
graphrag prompt-tune --root /path/to/project --no-discover-entity-types
For most use cases, running graphrag prompt-tune with minimal options is sufficient and recommended.

Command syntax

The full command syntax with all available options:
graphrag prompt-tune [--root ROOT] [--domain DOMAIN] [--selection-method METHOD] \
  [--limit LIMIT] [--language LANGUAGE] [--max-tokens MAX_TOKENS] \
  [--chunk-size CHUNK_SIZE] [--n-subset-max N_SUBSET_MAX] [--k K] \
  [--min-examples-required MIN_EXAMPLES_REQUIRED] [--discover-entity-types] \
  [--output OUTPUT]

Command-line options

--root
string
default:"current directory"
Path to the project directory that contains the config file (settings.yaml).
--domain
string
default:"auto-detected"
The domain related to your input data, such as ‘space science’, ‘microbiology’, or ‘environmental news’. If left empty, the domain will be inferred from the input data.
--selection-method
string
default:"random"
The method to select documents. Options are:
  • random - Select text units randomly (recommended)
  • top - Select the first n text units
  • all - Use all text units (only for small datasets)
  • auto - Use embedding-based selection for representative samples
--limit
number
default:"15"
The number of text units to load when using random or top selection methods.
--language
string
default:"auto-detected"
The language to use for input processing. If it differs from the inputs’ language, the LLM will translate. Default is "" meaning it will be automatically detected from the inputs.
--max-tokens
number
default:"2000"
Maximum token count for prompt generation.
--chunk-size
number
default:"200"
The size in tokens to use for generating text units from input documents.
--n-subset-max
number
default:"300"
The number of text chunks to embed when using the auto selection method.
--k
number
default:"15"
The number of documents to select when using the auto selection method.
--min-examples-required
number
default:"2"
The minimum number of examples required for entity extraction prompts.
--discover-entity-types
boolean
default:"false"
Allow the LLM to discover and extract entities automatically. We recommend using this when your data covers a lot of topics or is highly randomized.
--output
string
default:"prompts"
The folder to save the generated prompts.

Advanced usage example

For advanced users who want to customize the auto tuning process:
graphrag prompt-tune \
  --root /path/to/project \
  --domain "environmental news" \
  --selection-method random \
  --limit 10 \
  --language English \
  --max-tokens 2048 \
  --chunk-size 256 \
  --min-examples-required 3 \
  --no-discover-entity-types \
  --output /path/to/output

Document selection methods

The auto tuning feature ingests the input data and then divides it into text units the size of the chunk size parameter. After that, it uses one of the following selection methods to pick a sample to work with for prompt generation:

random

Select text units randomly from your dataset.When to use: This is the default and recommended option for most use cases.Parameters: --limit controls how many text units to select.
Select the first n text units from your dataset.When to use: When your data is already ordered in a meaningful way.Parameters: --limit controls how many text units to select.
Use all text units for prompt generation.When to use: Only with small datasets; this option is not usually recommended as it can be slow and expensive.Warning: This will process your entire dataset and may incur significant LLM costs.
Embed text units in a lower-dimensional space and select the k nearest neighbors to the centroid.When to use: When you have a large dataset and want to select a representative sample automatically.Parameters:
  • --n-subset-max controls how many chunks to embed (default: 300)
  • --k controls how many documents to select (default: 15)

Update configuration

After running auto tuning, you need to modify your settings.yaml file to use the newly generated prompts. Update the following configuration variables:
entity_extraction:
  prompt: "prompts/extract_graph.txt"

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"

claim_extraction:
  prompt: "prompts/extract_claims.txt"

community_reports:
  prompt: "prompts/community_report.txt"
Make sure to update the correct path to the generated prompts. If you used the --output parameter, update the paths accordingly.

Generated prompt files

The auto tuning process generates the following prompt files:

extract_graph.txt

Entity and relationship extraction prompt

summarize_descriptions.txt

Entity and relationship description summarization prompt

extract_claims.txt

Claim extraction prompt (if enabled)

community_report.txt

Community report generation prompt

Next steps

After generating your domain-adapted prompts:
1

Review generated prompts

Check the generated prompt files in your output directory to understand what was created.
2

Update settings.yaml

Add the prompt file paths to your configuration as shown above.
3

Run indexing

Execute graphrag index to build your knowledge graph with the tuned prompts.
4

Evaluate results

Compare the quality of entities and relationships extracted with your tuned prompts.

Run indexing

Learn how to run the indexing pipeline with your tuned prompts

Build docs developers (and LLMs) love