Auto tuning analyzes your input data and generates customized prompts that are specifically adapted to your domain, resulting in significantly better entity extraction and knowledge graph quality.

Prerequisites
Before running auto tuning, ensure you have already initialized your workspace with thegraphrag init command. This will create the necessary configuration files and the default prompts.
Initialization
Learn how to initialize your GraphRAG workspace
Basic usage
You can run the auto tuning command with minimal configuration (recommended):Command syntax
The full command syntax with all available options:Command-line options
Path to the project directory that contains the config file (
settings.yaml).The domain related to your input data, such as ‘space science’, ‘microbiology’, or ‘environmental news’. If left empty, the domain will be inferred from the input data.
The method to select documents. Options are:
random- Select text units randomly (recommended)top- Select the first n text unitsall- Use all text units (only for small datasets)auto- Use embedding-based selection for representative samples
The number of text units to load when using
random or top selection methods.The language to use for input processing. If it differs from the inputs’ language, the LLM will translate. Default is "" meaning it will be automatically detected from the inputs.
Maximum token count for prompt generation.
The size in tokens to use for generating text units from input documents.
The number of text chunks to embed when using the
auto selection method.The number of documents to select when using the
auto selection method.The minimum number of examples required for entity extraction prompts.
Allow the LLM to discover and extract entities automatically. We recommend using this when your data covers a lot of topics or is highly randomized.
The folder to save the generated prompts.
Advanced usage example
For advanced users who want to customize the auto tuning process:Document selection methods
The auto tuning feature ingests the input data and then divides it into text units the size of the chunk size parameter. After that, it uses one of the following selection methods to pick a sample to work with for prompt generation:random
random
Select text units randomly from your dataset.When to use: This is the default and recommended option for most use cases.Parameters:
--limit controls how many text units to select.top
top
Select the first n text units from your dataset.When to use: When your data is already ordered in a meaningful way.Parameters:
--limit controls how many text units to select.all
all
Use all text units for prompt generation.When to use: Only with small datasets; this option is not usually recommended as it can be slow and expensive.Warning: This will process your entire dataset and may incur significant LLM costs.
auto
auto
Embed text units in a lower-dimensional space and select the k nearest neighbors to the centroid.When to use: When you have a large dataset and want to select a representative sample automatically.Parameters:
--n-subset-maxcontrols how many chunks to embed (default: 300)--kcontrols how many documents to select (default: 15)
Update configuration
After running auto tuning, you need to modify yoursettings.yaml file to use the newly generated prompts. Update the following configuration variables:
Generated prompt files
The auto tuning process generates the following prompt files:extract_graph.txt
Entity and relationship extraction prompt
summarize_descriptions.txt
Entity and relationship description summarization prompt
extract_claims.txt
Claim extraction prompt (if enabled)
community_report.txt
Community report generation prompt
Next steps
After generating your domain-adapted prompts:Review generated prompts
Check the generated prompt files in your output directory to understand what was created.
Run indexing
Learn how to run the indexing pipeline with your tuned prompts