Overview
AI Crawler Control helps you manage how AI-powered bots (PerplexityBot, GPTBot, CCBot, anthropic-ai) access your content. Generate robots.txt rules and HTTP header snippets to allow or block AI crawlers.Supported AI Crawlers
PerplexityBot
Used by: Perplexity AICrawls content for Perplexity’s answer engine
GPTBot
Used by: OpenAI (ChatGPT)Collects training data for GPT models
CCBot
Used by: Common CrawlArchives web content; used by multiple AI systems
anthropic-ai
Used by: Anthropic (Claude)Gathers data for Claude AI training
Configuration
Select Bots to Block
Check the boxes for AI crawlers you want to block:
- PerplexityBot
- GPTBot (ChatGPT)
- CCBot (Common Crawl)
- anthropic-ai
GEO AI does not write to your robots.txt file automatically. You must manually add the rules to your server.
Implementation Details
Admin Interface
includes/class-geoai-admin.php
Robots.txt Preview Generation
includes/class-geoai-admin.php
Generated Rules Example
When blocking all AI crawlers, GEO AI generates:robots.txt
Adding Rules to robots.txt
- Manual Edit
- WordPress Plugin
- Create New File
Selective Blocking
Block crawlers from specific sections only:robots.txt
HTTP Headers (Advanced)
For more control, block AI crawlers with HTTP headers:- Apache (.htaccess)
- Nginx
- PHP (functions.php)
.htaccess
Verification
Verify your robots.txt is working:Test with Google
Use Google Search Console Robots Testing Tool to validate syntax
Important Caveats
Not Guaranteed
Crawlers can ignore robots.txt. Only honest bots comply.
Reduces AI Visibility
Blocking prevents your content from appearing in AI answer engines.
May Not Stop Training
Some AI models may already have your content from past crawls.
No Legal Protection
robots.txt is a suggestion, not a legally binding restriction.
When to Block AI Crawlers
Premium/Paywalled Content
Premium/Paywalled Content
Proprietary Research/Data
Proprietary Research/Data
Consider blocking if you have unique research or data you don’t want in AI training sets.
E-commerce Product Descriptions
E-commerce Product Descriptions
Consider blocking to prevent AI from generating competing product descriptions.
Legal/Compliance Requirements
Legal/Compliance Requirements
Block if required by industry regulations or contractual obligations.
When to Allow AI Crawlers
Marketing Content
Allow crawlers to increase visibility in AI answer engines.
Blog Posts
Get free exposure through AI-powered search results.
Educational Content
Help AI systems provide accurate information.
Public Information
Content meant to be widely accessible benefits from AI indexing.
Alternative Approaches
AI-TXT Standard
Some organizations are developingai.txt specification:
ai.txt
ai.txt is not yet a widely adopted standard. Most crawlers still use robots.txt.Monitoring Crawler Activity
Track AI crawler visits in server logs:Best Practices
Default: Allow
Unless you have specific reasons, allow AI crawlers for better visibility.
Be Selective
Block crawlers only from sensitive sections, not entire site.
Monitor Impact
Track referral traffic from AI engines before/after blocking.
Understand Limitations
Remember robots.txt is advisory, not enforceable.
Document Decisions
Keep notes on why you blocked certain crawlers.
Review Regularly
Re-evaluate your blocking strategy quarterly.
Troubleshooting
Rules not working
Rules not working
Check:
- Rules are correctly added to
robots.txtin root directory - Syntax is correct (case-sensitive user-agent names)
- File is accessible at
yoursite.com/robots.txt - No caching plugin serving old robots.txt
Crawlers still visiting
Crawlers still visiting
Remember:
- Crawlers may ignore robots.txt (not enforceable)
- Check user agent string in logs to confirm bot identity
- Consider HTTP header blocking for stricter control
robots.txt not found
robots.txt not found
Solution:
- Create
robots.txtfile in WordPress root directory - Ensure file has correct permissions (644)
- Clear any caching plugins
Related Features
AI Audit
Optimize content for AI visibility
Sitemaps
Help crawlers find your content
Meta Tags
Robots meta for indexing control