Skip to main content
The llms.txt format is a standardized way to represent website content for Large Language Model consumption. This guide explains the specification and how the llms.txt Generator implements it.

What is llms.txt?

llms.txt is a lightweight, human-readable format for describing website content in a way that’s optimized for LLM understanding. It’s similar in spirit to robots.txt or sitemap.xml, but designed specifically for AI consumption. Learn more at llmstxt.org

Format Overview

An llms.txt file is a Markdown document with a specific structure:
Structure
# [Site Name]

> [Brief site description]

## [Section Name]

- [Page Title](url): Page description [Tags]
- [Page Title](url): Page description [Tags]

## [Section Name]

- [Page Title](url): Page description [Tags]

## Optional

- [Secondary pages without descriptions]

Specification Components

1. Header Section

# Site Title

> Brief description of the website
Implementation Details:
1

Extract site title

The generator uses the homepage <title> tag as the site name.
formatter.py:75-84
def get_site_title(homepage: PageInfo, base_url: str) -> str:
    title = homepage.title.strip() if homepage.title else ""
    
    generic_titles = {'home', 'welcome', 'index', ''}
    if title.lower() in generic_titles:
        domain = urlparse(base_url).netloc
        domain = domain.replace('www.', '').replace('.com', '').replace('.org', '')
        return domain.replace('.', ' ').title()
    
    return truncate(title, 80)
If the title is generic (“Home”, “Welcome”), it derives a title from the domain name.
2

Generate site description

The description comes from the homepage’s meta description or OpenGraph tags.
formatter.py:86-88
def get_summary(homepage: PageInfo) -> str:
    desc = homepage.description or homepage.snippet or "No description available"
    return truncate(desc.strip(), 200)
Fallback priority:
  1. <meta name="description">
  2. <meta property="og:description">
  3. First paragraph of body text
  4. “No description available”

2. Section Organization

Content is organized into sections based on URL structure:
Section Example
## Documentation

- [Getting Started](https://example.com/docs/getting-started): Quick introduction...
- [API Reference](https://example.com/docs/api): Complete API documentation...

## Guides

- [Authentication](https://example.com/guides/auth): Learn how to authenticate...
- [Best Practices](https://example.com/guides/best-practices): Recommended patterns...
Implementation Details:
1

Extract sections from URLs

Sections are derived from the first path segment:
formatter.py:124-128
for page in pages[1:]:
    clean = clean_url(page.url)
    path_parts = clean.replace(base_url, "").strip("/").split("/")
    section = path_parts[0] if path_parts and path_parts[0] else "main"
Examples:
  • https://example.com/docs/intro → “docs” section
  • https://example.com/api/users → “api” section
  • https://example.com/about → “about” section
2

Clean section names

Section names are formatted for readability:
formatter.py:90-103
def clean_section_name(name: str) -> str:
    if not name or not name.strip():
        return "Main"
    
    name = name.strip()
    abbrevs = {'api', 'rest', 'graphql', 'sdk', 'cli', 'ui', 'ux', 'faq', 'rss'}
    
    name = name.replace('-', ' ').replace('_', ' ')
    words = name.split()
    
    return ' '.join(
        w.upper() if w.lower() in abbrevs else w.capitalize()
        for w in words
    )
Transformations:
  • api-reference → “API Reference”
  • getting_started → “Getting Started”
  • faq → “FAQ”
3

Separate primary and secondary content

Secondary content (privacy, terms, etc.) is grouped into an “Optional” section:
formatter.py:7-14
SECONDARY_PATH_PATTERNS = [
    '/privacy', '/terms', '/legal', '/cookie', '/disclaimer',
    '/sitemap', '/changelog', '/release',
    '/contributing', '/code-of-conduct', '/governance', '/license',
    '/about', '/team', '/career', '/job', '/contact', '/company',
    '/twitter', '/github', '/linkedin', '/facebook', '/social',
    '/archive', '/old', '/legacy', '/deprecated',
]

3. Page Entries

Each page is represented as a list item with:
  • Title (linked to URL)
  • Description
  • Optional tags
Page Entry Format
- [Title](url): Description [Tag1] [Tag2]
Implementation Details:
1

Format URLs

URLs are cleaned and prefer Markdown versions when available:
formatter.py:16-31
def clean_url(url: str) -> str:
    parsed = urlparse(url)
    return urlunparse((parsed.scheme, parsed.netloc, parsed.path, '', '', ''))

def get_md_url(url: str) -> str:
    parsed = urlparse(url)
    path = parsed.path
    
    if path.endswith('.html'):
        md_path = f"{path}.md"
    elif path.endswith('/') or not path:
        md_path = f"{path}index.html.md" if path.endswith('/') else f"{path}/index.html.md"
    else:
        md_path = f"{path}.md"
    
    return urlunparse((parsed.scheme, parsed.netloc, md_path, '', '', ''))
The generator checks if .md versions exist via HEAD requests and prefers them for better LLM parsing.
2

Extract and truncate descriptions

Descriptions are extracted from page metadata and truncated:
formatter.py:69-73
def truncate(text: str, length: int = 150) -> str:
    if not text:
        return ""
    text = text.strip()
    return f"{text[:length]}..." if len(text) > length else text
Default truncation:
  • Section descriptions: 150 characters
  • Site summary: 200 characters
3

Assign content tags

Tags are automatically assigned based on page content:
tagger.py:4-23
TAG_PATTERNS = {
    'API': ['api', 'rest-api', 'graphql', 'endpoint', '/api/', 'api-reference'],
    'Guide': ['guide', 'tutorial', 'how-to', 'walkthrough', 'learn'],
    'Quickstart': ['getting-started', 'quickstart', 'quick-start', 'quick start',
                   'get-started', 'getting started', 'start'],
    'Reference': ['reference', 'documentation', '/docs/', 'reference-guide'],
    'Example': ['example', 'sample', 'demo', 'code-sample', 'examples'],
    'SDK': ['sdk', 'library', 'client-library', 'package'],
    'CLI': ['cli', 'command-line', 'terminal', 'commands'],
    'Blog': ['blog', 'article', 'post', '/blog/', 'news'],
    'Changelog': ['changelog', 'release-notes', 'releases', 'updates', 'what-new'],
    
    'Beginner': ['getting-started', 'intro', 'introduction', 'basics', 'fundamentals'],
    'Advanced': ['advanced', 'expert', 'in-depth', 'deep-dive'],
    
    'Security': ['security', 'auth', 'authentication', 'authorization', 'oauth'],
    'Performance': ['performance', 'optimization', 'speed', 'caching'],
    'Integration': ['integration', 'webhook', 'third-party', 'connect'],
    'Troubleshooting': ['troubleshoot', 'debug', 'error', 'faq', 'common-issues'],
}
Tag categories:
  • Content Type: API, Guide, Reference, Example, SDK, CLI, Blog, Changelog
  • Complexity: Beginner, Advanced
  • Topic: Security, Performance, Integration, Troubleshooting

4. Optional Section

Secondary pages are grouped at the end without descriptions:
Optional Section
## Optional

- [Privacy Policy](https://example.com/privacy)
- [Terms of Service](https://example.com/terms)
- [Code of Conduct](https://example.com/code-of-conduct)
Implementation:
formatter.py:146-173
if not sections:
    return "\n".join(lines)

primary = {}
secondary = {}

for section_name, links in sections.items():
    if is_secondary_section(section_name):
        secondary[section_name] = links
    else:
        primary[section_name] = links

# Add primary sections
for section_name in sorted(primary.keys()):
    clean_name = clean_section_name(section_name)
    lines.extend([
        f"## {clean_name}",
        "",
        *primary[section_name],
        ""
    ])

# Add optional section
if secondary:
    lines.extend([
        "## Optional",
        "",
    ])
    
    for section_name in sorted(secondary.keys()):
        lines.extend(secondary[section_name])

Complete Example

llms.txt
# FastAPI Documentation

> FastAPI is a modern, fast web framework for building APIs with Python based on standard Python type hints

## Tutorial

- [First Steps](https://fastapi.tiangolo.com/tutorial/first-steps): Create your first FastAPI application [Quickstart] [Beginner]
- [Path Parameters](https://fastapi.tiangolo.com/tutorial/path-params): Declare path parameters in routes [Guide]
- [Query Parameters](https://fastapi.tiangolo.com/tutorial/query-params): Handle query parameters in requests [Guide]
- [Request Body](https://fastapi.tiangolo.com/tutorial/body): Define request body with Pydantic models [Guide]

## Advanced

- [Security](https://fastapi.tiangolo.com/advanced/security): Implement OAuth2 and JWT authentication [Security] [Advanced]
- [Custom Response](https://fastapi.tiangolo.com/advanced/custom-response): Return custom response types [Advanced]
- [Testing](https://fastapi.tiangolo.com/advanced/testing): Write tests for your API [Guide]

## Deployment

- [Docker](https://fastapi.tiangolo.com/deployment/docker): Deploy with Docker containers [Guide]
- [Server Workers](https://fastapi.tiangolo.com/deployment/server-workers): Configure Gunicorn and Uvicorn [Guide]

## Optional

- [Release Notes](https://fastapi.tiangolo.com/release-notes)
- [Contributing](https://fastapi.tiangolo.com/contributing)
- [Privacy Policy](https://fastapi.tiangolo.com/privacy)

Specification Compliance

The generator adheres to the official llmstxt.org specification:
All output is valid Markdown that can be parsed by standard Markdown processors.
  • Uses standard heading syntax (#, ##)
  • Uses standard link syntax ([text](url))
  • Uses standard list syntax (-)
  • Uses standard blockquote syntax (>)
Content is organized in a clear hierarchy:
  1. Site title (H1)
  2. Site description (blockquote)
  3. Sections (H2)
  4. Pages (list items)
Pages are grouped logically:
  • Primary content by URL structure
  • Secondary content in “Optional” section
  • Alphabetically sorted within sections
URLs are normalized and cleaned:
  • Query parameters removed
  • Fragments removed
  • Prefers .md versions when available
  • Uses HTTPS when available
Descriptions are truncated at semantic boundaries:
  • Truncates at word boundaries (not mid-word)
  • Adds ellipsis when truncated
  • Configurable length limits
Pages include contextual metadata:
  • Content type tags (API, Guide, etc.)
  • Complexity tags (Beginner, Advanced)
  • Topic tags (Security, Performance, etc.)

Best Practices

Keep Descriptions Concise

Descriptions should be 100-200 characters. The generator enforces this automatically.

Use Semantic Sections

Organize content by user journey (Getting Started, Guides, API Reference) rather than technical structure.

Include Key Pages

Prioritize documentation, guides, and API references over marketing pages.

Update Regularly

Use auto-update to keep llms.txt synchronized with website changes.

Customization

While the specification is standardized, you can customize the generator’s behavior:

Section Patterns

Modify secondary content detection:
formatter.py:7-14
SECONDARY_PATH_PATTERNS = [
    '/privacy', '/terms', '/legal',
    # Add your patterns
    '/disclaimer', '/cookie-policy',
]

Tag Patterns

Add custom tag detection:
tagger.py:4-23
TAG_PATTERNS = {
    'API': ['api', 'rest-api', 'graphql'],
    'Custom': ['custom-pattern', 'special'],  # Add custom tags
}

Truncation Limits

Adjust description lengths:
formatter.py:135-136
tags = assign_tags(page, section_name=section)
desc = truncate(page.description, 150)  # Change limit here

Validation

Validate your llms.txt file:
1

Check Markdown syntax

Validate with markdownlint
npm install -g markdownlint-cli
markdownlint llms.txt
2

Verify link accessibility

Check links
npm install -g markdown-link-check
markdown-link-check llms.txt
3

Test LLM parsing

Ask an LLM to summarize your llms.txt:
Given this llms.txt file, what are the main sections and key pages?

[paste your llms.txt content]

FAQ

Markdown is:
  • Human-readable and editable
  • LLM-friendly (models train on Markdown)
  • Version control friendly
  • Simpler than structured formats
Sitemaps are for search engine crawlers. llms.txt is optimized for LLM understanding:
  • Includes descriptions and context
  • Organized by user journey
  • Includes content type hints
  • Filters out irrelevant pages
Yes! The generator provides a starting point. You can:
  • Reorder sections
  • Edit descriptions
  • Add/remove pages
  • Customize tags
Consider using auto-update with caution if you make manual edits.
No. Focus on content valuable to LLMs:
  • Documentation and guides
  • API references
  • Conceptual content
  • Examples and tutorials
Exclude:
  • Marketing pages
  • Legal pages (or put in Optional)
  • Duplicate content
  • Internal tools/admin pages

Resources

llmstxt.org

Official specification and guidelines

Example Sites

Real-world llms.txt implementations

Formatter Code

Implementation details in the codebase

Web Interface

Generate your own llms.txt file

Next Steps

Generate Your First File

Create an llms.txt file in minutes

API Usage

Integrate programmatically

Configuration

Customize the generator behavior

Development

Contribute to the project

Build docs developers (and LLMs) love