Honeypot Mode

Anubis includes a sophisticated honeypot system designed to detect automated scanners and malicious bots. The honeypot generates fake pages that lure bots into revealing themselves.

How It Works

The honeypot system works by:

Generating fake content: Creates plausible-looking pages using spintax (spinner text) technology
Tracking access patterns: Monitors how frequently specific user agents and networks access honeypot pages
Incrementing reputation scores: Increases weight for IPs and user agents that repeatedly access honeypot URLs
Automatically blocking repeat offenders: Once threshold limits are reached, traffic is weighted heavily or blocked

Architecture

The honeypot is automatically initialized when Anubis starts. It creates special endpoints at:

/api/honeypot/{id}/{stage}

These URLs are not linked from legitimate pages but may be discovered by:

Automated scanners probing for common paths
Bots following every link indiscriminately
Crawlers that don’t respect robots.txt

Automatic Rule Creation

When the honeypot initializes successfully, Anubis automatically adds two policy rules:

Network-based Detection

- name: honeypot/network
  action: WEIGH
  weight:
    adjust: 30

This rule adds 30 weight points to requests from networks (IP ranges) that have accessed honeypot pages 25 or more times.

User-Agent Detection

- name: honeypot/user-agent
  action: WEIGH
  weight:
    adjust: 30

This rule adds 30 weight points to requests from user agents that have accessed honeypot pages 25 or more times.

Content Generation

The honeypot uses spintax (spinner text) technology to generate unique, plausible-looking content for each request. This is the same technology spammers use, now repurposed for defense. Spintax allows the system to:

Generate thousands of unique page variations computationally cheaply
Create convincing fake blog posts, articles, and affirmations
Avoid pattern detection by sophisticated bots

Tracking and Metrics

The honeypot tracks:

IP addresses: Clamped to network ranges for better pattern detection
User agents: SHA256 hashed for privacy
Hit counts: How many times each network/UA has accessed honeypot pages
Access patterns: Timing and frequency of honeypot access

Prometheus Metrics

Honeypot performance is tracked via Prometheus:

anubis_honeypot_pagegen_timings

This histogram tracks page generation timing with the label method="naive".

Storage

Honeypot data is stored in your configured Anubis store backend with the following prefixes:

honeypot:info - General honeypot information
honeypot:user-agent - User agent reputation scores
honeypot:network - Network reputation scores

Data is stored with a 1-hour TTL and automatically expires.

Detection Thresholds

The honeypot uses the following thresholds:

Threshold	Action	Description
25 hits	Add weight	Network or UA that has accessed honeypot 25+ times gets +30 weight
256 hits	Log warning	System logs a warning about possible crawler activity

Implementation Details

The honeypot is implemented in /internal/honeypot/naive/ and uses:

Spintax parsing: Pre-parsed templates for titles, bodies, and affirmations
UUID generation: Random IDs for honeypot sessions and fake links
SHA256 hashing: Privacy-preserving storage of user agent strings
IP clamping: Converts individual IPs to network ranges for better pattern detection

Example Honeypot Flow

Bot accesses /api/honeypot/init/start
System generates unique content with random links
Bot follows links to /api/honeypot/{uuid}/{stage}
Each access increments network and UA counters
After 25 accesses, the bot’s future requests get +30 weight
At 256 accesses, system logs a warning about the crawler

Benefits

Automatic detection: No manual configuration needed
Low overhead: Spintax generation is computationally cheap
Privacy-preserving: User agents are hashed, not stored in plaintext
Self-cleaning: Data expires after 1 hour
Adaptive: Learns patterns specific to your traffic

Monitoring

To monitor honeypot activity:

Check logs: Look for “found new entrance point” and “found possible crawler” messages
Query metrics: Check the anubis_honeypot_pagegen_timings histogram
Inspect store: Query the honeypot:* prefixes in your store backend

Example Log Output

level=DEBUG msg="found new entrance point" id=abc123 stage=init userAgent="BadBot/1.0" clampedIP=192.0.2.0/24
level=WARN msg="found possible crawler" id=abc123 network=192.0.2.0/24

Advanced Usage

Integrating with Custom Policies

You can create additional rules that work with honeypot data:

- name: block-persistent-honeypot-visitors
  action: DENY
  expression:
    all:
      - 'weight >= 60'  # Two honeypot violations
      - 'headers["User-Agent"].contains("bot")'

Combining with Thoth

When used with Thoth integration, honeypot detection becomes even more powerful:

Local honeypot detects bot behavior
Thoth provides ASN and GeoIP context
Combined data creates sophisticated bot profiles
Shared intelligence benefits all Anubis deployments

Limitations

Sophisticated bots: Advanced bots may avoid honeypot links
False positives: Legitimate crawlers may trigger detection
Storage requirements: High-traffic sites may generate significant honeypot data
1-hour TTL: Data expires quickly, may miss slow crawlers

Best Practices

Monitor logs: Regularly review honeypot detection logs
Adjust thresholds: If you get too many false positives, increase the threshold from 25
Combine with other rules: Use honeypot as one signal among many
Test your bots: Ensure your legitimate crawlers avoid honeypot URLs
Use with robots.txt: Enable SERVE_ROBOTS_TXT to help legitimate crawlers

Get Started

Core Concepts

Installation

Administration

Deployment

Integrations

Honeypot Mode

Honeypot Mode

How It Works

Architecture

Automatic Rule Creation

Network-based Detection

User-Agent Detection

Content Generation

Tracking and Metrics

Prometheus Metrics

Storage

Detection Thresholds

Implementation Details

Example Honeypot Flow

Benefits

Monitoring

Example Log Output

Advanced Usage

Integrating with Custom Policies

Combining with Thoth

Limitations

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Installation

Administration

Deployment

Integrations

​Honeypot Mode

​How It Works

​Architecture

​Automatic Rule Creation

​Network-based Detection

​User-Agent Detection

​Content Generation

​Tracking and Metrics

​Prometheus Metrics

​Storage

​Detection Thresholds

​Implementation Details

​Example Honeypot Flow

​Benefits

​Monitoring

​Example Log Output

​Advanced Usage

​Integrating with Custom Policies

​Combining with Thoth

​Limitations

​Best Practices

Build docs developers (and LLMs) love

Honeypot Mode

How It Works

Architecture

Automatic Rule Creation

Network-based Detection

User-Agent Detection

Content Generation

Tracking and Metrics

Prometheus Metrics

Storage

Detection Thresholds

Implementation Details

Example Honeypot Flow

Benefits

Monitoring

Example Log Output

Advanced Usage

Integrating with Custom Policies

Combining with Thoth

Limitations

Best Practices