Skip to main content

Honeypot Mode

Anubis includes a sophisticated honeypot system designed to detect automated scanners and malicious bots. The honeypot generates fake pages that lure bots into revealing themselves.

How It Works

The honeypot system works by:
  1. Generating fake content: Creates plausible-looking pages using spintax (spinner text) technology
  2. Tracking access patterns: Monitors how frequently specific user agents and networks access honeypot pages
  3. Incrementing reputation scores: Increases weight for IPs and user agents that repeatedly access honeypot URLs
  4. Automatically blocking repeat offenders: Once threshold limits are reached, traffic is weighted heavily or blocked

Architecture

The honeypot is automatically initialized when Anubis starts. It creates special endpoints at:
/api/honeypot/{id}/{stage}
These URLs are not linked from legitimate pages but may be discovered by:
  • Automated scanners probing for common paths
  • Bots following every link indiscriminately
  • Crawlers that don’t respect robots.txt

Automatic Rule Creation

When the honeypot initializes successfully, Anubis automatically adds two policy rules:

Network-based Detection

- name: honeypot/network
  action: WEIGH
  weight:
    adjust: 30
This rule adds 30 weight points to requests from networks (IP ranges) that have accessed honeypot pages 25 or more times.

User-Agent Detection

- name: honeypot/user-agent
  action: WEIGH
  weight:
    adjust: 30
This rule adds 30 weight points to requests from user agents that have accessed honeypot pages 25 or more times.

Content Generation

The honeypot uses spintax (spinner text) technology to generate unique, plausible-looking content for each request. This is the same technology spammers use, now repurposed for defense. Spintax allows the system to:
  • Generate thousands of unique page variations computationally cheaply
  • Create convincing fake blog posts, articles, and affirmations
  • Avoid pattern detection by sophisticated bots

Tracking and Metrics

The honeypot tracks:
  • IP addresses: Clamped to network ranges for better pattern detection
  • User agents: SHA256 hashed for privacy
  • Hit counts: How many times each network/UA has accessed honeypot pages
  • Access patterns: Timing and frequency of honeypot access

Prometheus Metrics

Honeypot performance is tracked via Prometheus:
anubis_honeypot_pagegen_timings
This histogram tracks page generation timing with the label method="naive".

Storage

Honeypot data is stored in your configured Anubis store backend with the following prefixes:
  • honeypot:info - General honeypot information
  • honeypot:user-agent - User agent reputation scores
  • honeypot:network - Network reputation scores
Data is stored with a 1-hour TTL and automatically expires.

Detection Thresholds

The honeypot uses the following thresholds:
ThresholdActionDescription
25 hitsAdd weightNetwork or UA that has accessed honeypot 25+ times gets +30 weight
256 hitsLog warningSystem logs a warning about possible crawler activity

Implementation Details

The honeypot is implemented in /internal/honeypot/naive/ and uses:
  • Spintax parsing: Pre-parsed templates for titles, bodies, and affirmations
  • UUID generation: Random IDs for honeypot sessions and fake links
  • SHA256 hashing: Privacy-preserving storage of user agent strings
  • IP clamping: Converts individual IPs to network ranges for better pattern detection

Example Honeypot Flow

  1. Bot accesses /api/honeypot/init/start
  2. System generates unique content with random links
  3. Bot follows links to /api/honeypot/{uuid}/{stage}
  4. Each access increments network and UA counters
  5. After 25 accesses, the bot’s future requests get +30 weight
  6. At 256 accesses, system logs a warning about the crawler

Benefits

  • Automatic detection: No manual configuration needed
  • Low overhead: Spintax generation is computationally cheap
  • Privacy-preserving: User agents are hashed, not stored in plaintext
  • Self-cleaning: Data expires after 1 hour
  • Adaptive: Learns patterns specific to your traffic

Monitoring

To monitor honeypot activity:
  1. Check logs: Look for “found new entrance point” and “found possible crawler” messages
  2. Query metrics: Check the anubis_honeypot_pagegen_timings histogram
  3. Inspect store: Query the honeypot:* prefixes in your store backend

Example Log Output

level=DEBUG msg="found new entrance point" id=abc123 stage=init userAgent="BadBot/1.0" clampedIP=192.0.2.0/24
level=WARN msg="found possible crawler" id=abc123 network=192.0.2.0/24

Advanced Usage

Integrating with Custom Policies

You can create additional rules that work with honeypot data:
- name: block-persistent-honeypot-visitors
  action: DENY
  expression:
    all:
      - 'weight >= 60'  # Two honeypot violations
      - 'headers["User-Agent"].contains("bot")'

Combining with Thoth

When used with Thoth integration, honeypot detection becomes even more powerful:
  1. Local honeypot detects bot behavior
  2. Thoth provides ASN and GeoIP context
  3. Combined data creates sophisticated bot profiles
  4. Shared intelligence benefits all Anubis deployments

Limitations

  • Sophisticated bots: Advanced bots may avoid honeypot links
  • False positives: Legitimate crawlers may trigger detection
  • Storage requirements: High-traffic sites may generate significant honeypot data
  • 1-hour TTL: Data expires quickly, may miss slow crawlers

Best Practices

  1. Monitor logs: Regularly review honeypot detection logs
  2. Adjust thresholds: If you get too many false positives, increase the threshold from 25
  3. Combine with other rules: Use honeypot as one signal among many
  4. Test your bots: Ensure your legitimate crawlers avoid honeypot URLs
  5. Use with robots.txt: Enable SERVE_ROBOTS_TXT to help legitimate crawlers

Build docs developers (and LLMs) love