Auto prompt tuning

GraphRAG provides the ability to create domain-adapted prompts for the generation of the knowledge graph. This step is optional, though it is highly encouraged to run it as it will yield better results when executing an index run.

Auto tuning analyzes your input data and generates customized prompts that are specifically adapted to your domain, resulting in significantly better entity extraction and knowledge graph quality.

These prompts are generated by loading the inputs, splitting them into chunks (text units) and then running a series of LLM invocations and template substitutions to generate the final prompts. We suggest using the default values provided by the script, but in this page you’ll find the details of each parameter in case you want to further explore and tweak the prompt tuning algorithm.

Prerequisites

Before running auto tuning, ensure you have already initialized your workspace with the graphrag init command. This will create the necessary configuration files and the default prompts.

Initialization

Learn how to initialize your GraphRAG workspace

Basic usage

You can run the auto tuning command with minimal configuration (recommended):

graphrag prompt-tune --root /path/to/project --no-discover-entity-types

For most use cases, running graphrag prompt-tune with minimal options is sufficient and recommended.

Command syntax

The full command syntax with all available options:

graphrag prompt-tune [--root ROOT] [--domain DOMAIN] [--selection-method METHOD] \
  [--limit LIMIT] [--language LANGUAGE] [--max-tokens MAX_TOKENS] \
  [--chunk-size CHUNK_SIZE] [--n-subset-max N_SUBSET_MAX] [--k K] \
  [--min-examples-required MIN_EXAMPLES_REQUIRED] [--discover-entity-types] \
  [--output OUTPUT]

Command-line options

--root

string

default:"current directory"

Path to the project directory that contains the config file (settings.yaml).

--domain

string

default:"auto-detected"

The domain related to your input data, such as ‘space science’, ‘microbiology’, or ‘environmental news’. If left empty, the domain will be inferred from the input data.

--selection-method

string

default:"random"

The method to select documents. Options are:

random - Select text units randomly (recommended)
top - Select the first n text units
all - Use all text units (only for small datasets)
auto - Use embedding-based selection for representative samples

--limit

number

default:"15"

The number of text units to load when using random or top selection methods.

--language

string

default:"auto-detected"

The language to use for input processing. If it differs from the inputs’ language, the LLM will translate. Default is "" meaning it will be automatically detected from the inputs.

--max-tokens

number

default:"2000"

Maximum token count for prompt generation.

--chunk-size

number

default:"200"

The size in tokens to use for generating text units from input documents.

--n-subset-max

number

default:"300"

The number of text chunks to embed when using the auto selection method.

--k

number

default:"15"

The number of documents to select when using the auto selection method.

--min-examples-required

number

default:"2"

The minimum number of examples required for entity extraction prompts.

--discover-entity-types

boolean

default:"false"

Allow the LLM to discover and extract entities automatically. We recommend using this when your data covers a lot of topics or is highly randomized.

--output

string

default:"prompts"

The folder to save the generated prompts.

Advanced usage example

For advanced users who want to customize the auto tuning process:

graphrag prompt-tune \
  --root /path/to/project \
  --domain "environmental news" \
  --selection-method random \
  --limit 10 \
  --language English \
  --max-tokens 2048 \
  --chunk-size 256 \
  --min-examples-required 3 \
  --no-discover-entity-types \
  --output /path/to/output

Document selection methods

The auto tuning feature ingests the input data and then divides it into text units the size of the chunk size parameter. After that, it uses one of the following selection methods to pick a sample to work with for prompt generation:

random

Select text units randomly from your dataset.When to use: This is the default and recommended option for most use cases.Parameters: --limit controls how many text units to select.

top

Select the first n text units from your dataset.When to use: When your data is already ordered in a meaningful way.Parameters: --limit controls how many text units to select.

all

Use all text units for prompt generation.When to use: Only with small datasets; this option is not usually recommended as it can be slow and expensive.Warning: This will process your entire dataset and may incur significant LLM costs.

auto

Embed text units in a lower-dimensional space and select the k nearest neighbors to the centroid.When to use: When you have a large dataset and want to select a representative sample automatically.Parameters:

--n-subset-max controls how many chunks to embed (default: 300)
--k controls how many documents to select (default: 15)

Update configuration

After running auto tuning, you need to modify your settings.yaml file to use the newly generated prompts. Update the following configuration variables:

entity_extraction:
  prompt: "prompts/extract_graph.txt"

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"

claim_extraction:
  prompt: "prompts/extract_claims.txt"

community_reports:
  prompt: "prompts/community_report.txt"

Make sure to update the correct path to the generated prompts. If you used the --output parameter, update the paths accordingly.

Generated prompt files

The auto tuning process generates the following prompt files:

extract_graph.txt

Entity and relationship extraction prompt

summarize_descriptions.txt

Entity and relationship description summarization prompt

extract_claims.txt

Claim extraction prompt (if enabled)

community_report.txt

Community report generation prompt

Next steps

After generating your domain-adapted prompts:

Review generated prompts

Check the generated prompt files in your output directory to understand what was created.

Update settings.yaml

Add the prompt file paths to your configuration as shown above.

Run indexing

Execute graphrag index to build your knowledge graph with the tuned prompts.

Evaluate results

Compare the quality of entities and relationships extracted with your tuned prompts.

Run indexing

Learn how to run the indexing pipeline with your tuned prompts

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

Auto prompt tuning

Prerequisites

Initialization

Basic usage

Command syntax

Command-line options

Advanced usage example

Document selection methods

Update configuration

Generated prompt files

extract_graph.txt

summarize_descriptions.txt

extract_claims.txt

community_report.txt

Next steps

Run indexing

Build docs developers (and LLMs) love

Get Started

Core Concepts

Indexing

Query Engine

Prompt Tuning

Configuration

Guides

​Prerequisites

Initialization

​Basic usage

​Command syntax

​Command-line options

​Advanced usage example

​Document selection methods

​Update configuration

​Generated prompt files

extract_graph.txt

summarize_descriptions.txt

extract_claims.txt

community_report.txt

​Next steps

Run indexing

Build docs developers (and LLMs) love

Prerequisites

Basic usage

Command syntax

Command-line options

Advanced usage example

Document selection methods

Update configuration

Generated prompt files

Next steps