Monitoring - Anubis

Anubis exposes Prometheus metrics and health check endpoints for monitoring and observability.

Metrics Server

Anubis runs a separate metrics server on port 9090 by default.

Configuration

# Default: localhost:9090 on tcp
anubis --metrics-bind :9090

# Custom address
anubis --metrics-bind :8080

# Bind to specific interface
anubis --metrics-bind 192.168.1.10:9090

# Unix socket
anubis --metrics-bind unix:///var/run/anubis/metrics.sock \
       --metrics-bind-network unix

# Disable metrics
anubis --metrics-bind ""

Endpoints

Path	Description
`/metrics`	Prometheus metrics (text format)
`/healthz`	Health check (HTTP 200 = OK)

Prometheus Metrics

Policy Results

Tracks how many requests matched each rule and what action was taken:

anubis_policy_results{rule="browsers",action="CHALLENGE"} 1234
anubis_policy_results{rule="googlebot",action="ALLOW"} 567
anubis_policy_results{rule="scrapers",action="DENY"} 89

Labels:

rule - Bot rule name from policy file
action - Action taken: ALLOW, DENY, CHALLENGE, WEIGH

Example queries:

# Total requests by action
sum by (action) (rate(anubis_policy_results[5m]))

# Top rules by request count
topk(10, sum by (rule) (rate(anubis_policy_results[5m])))

# Challenge rate
sum(rate(anubis_policy_results{action="CHALLENGE"}[5m]))

# Deny rate (potential attacks)
sum(rate(anubis_policy_results{action="DENY"}[5m]))

Available Metrics

Anubis exposes these metric families:

anubis_policy_results           # Policy rule matches (counter)
go_*                            # Go runtime metrics
process_*                       # Process metrics (CPU, memory, FDs)
promhttp_metric_handler_*       # Metrics endpoint stats

Health Checks

HTTP Health Check

# Check if Anubis is serving
curl http://localhost:9090/healthz

# Response:
# HTTP/1.1 200 OK
# OK

Status codes:

200 OK - Anubis is serving traffic
500 Internal Server Error - Anubis is not ready
424 Failed Dependency - Unknown health state

CLI Health Check

Anubis includes a built-in health check command:

anubis --healthcheck

# Exit codes:
# 0 = healthy
# 1 = unhealthy or error

Use cases:

Docker HEALTHCHECK
Kubernetes liveness/readiness probes
Systemd watchdog

Integration Examples

Docker

FROM ghcr.io/techarohq/anubis:latest

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD ["/usr/local/bin/anubis", "--healthcheck"]

Docker Compose

services:
  anubis:
    image: ghcr.io/techarohq/anubis:latest
    healthcheck:
      test: ["/usr/local/bin/anubis", "--healthcheck"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 5s

Kubernetes

Liveness Probe

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anubis
spec:
  template:
    spec:
      containers:
      - name: anubis
        image: ghcr.io/techarohq/anubis:latest
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9090
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /healthz
            port: 9090
          initialDelaySeconds: 5
          periodSeconds: 10

ServiceMonitor (Prometheus Operator)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: anubis
spec:
  selector:
    matchLabels:
      app: anubis
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
apiVersion: v1
kind: Service
metadata:
  name: anubis-metrics
  labels:
    app: anubis
spec:
  selector:
    app: anubis
  ports:
  - name: metrics
    port: 9090
    targetPort: 9090

Prometheus

Scrape Config

scrape_configs:
  - job_name: 'anubis'
    static_configs:
      - targets: ['anubis.example.com:9090']
    scrape_interval: 15s

Alert Rules

groups:
  - name: anubis
    rules:
      # High challenge rate (possible attack)
      - alert: AnubisHighChallengeRate
        expr: |
          sum(rate(anubis_policy_results{action="CHALLENGE"}[5m])) > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High Anubis challenge rate"
          description: "Challenge rate is {{ $value }}/sec (threshold: 100/sec)"

      # High deny rate (active attack)
      - alert: AnubisHighDenyRate
        expr: |
          sum(rate(anubis_policy_results{action="DENY"}[5m])) > 50
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High Anubis deny rate"
          description: "Deny rate is {{ $value }}/sec (possible attack)"

      # Anubis not serving
      - alert: AnubisDown
        expr: up{job="anubis"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Anubis is down"
          description: "Anubis instance {{ $labels.instance }} is not responding"

      # Low allow rate (too strict)
      - alert: AnubisLowAllowRate
        expr: |
          sum(rate(anubis_policy_results{action="ALLOW"}[10m])) 
          / sum(rate(anubis_policy_results[10m])) < 0.5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Anubis allow rate is low"
          description: "Only {{ $value | humanizePercentage }} of traffic is allowed"

Grafana Dashboard

Example dashboard panels:

Request Rate by Action

sum by (action) (rate(anubis_policy_results[5m]))

Visualization: Time series (stacked area)

Top Rules

topk(10, sum by (rule) (
  increase(anubis_policy_results[1h])
))

Visualization: Bar gauge

Challenge Success Rate

Requires application-level instrumentation (not built-in).

Logging

Anubis uses structured logging (JSON format) with configurable levels.

Log Levels

# Default: INFO
anubis --slog-level INFO

# Options: DEBUG, INFO, WARN, ERROR
anubis --slog-level DEBUG

# Fine-grained levels
anubis --slog-level "INFO+1"

Or via policy file:

logging:
  sink: stdio
  level: INFO

Log Sinks

Standard Error (Default)

logging:
  sink: stdio

Logs to stderr. Captured by Docker, Kubernetes, systemd.

File with Rotation

logging:
  sink: file
  parameters:
    file: /var/log/anubis/anubis.log
    maxBackups: 3
    maxBytes: 67108864  # 64 MiB
    maxAge: 7           # days
    compress: true
    useLocalTime: false

Old logs are compressed: anubis.log.2026-03-03T12:00:00Z.gz

Structured Log Fields

{
  "time": "2026-03-03T12:34:56Z",
  "level": "INFO",
  "msg": "request processed",
  "rule": "browsers",
  "action": "CHALLENGE",
  "remote_addr": "1.2.3.4",
  "user_agent": "Mozilla/5.0...",
  "path": "/api/users",
  "subsystem": "anubis"
}

Key fields:

rule - Matched rule name
action - Action taken
remote_addr - Client IP
subsystem - Component (anubis, metrics, config-validate)

Log Aggregation

Loki (Grafana)

# promtail.yml
clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: anubis
    static_configs:
      - targets:
          - localhost
        labels:
          job: anubis
          __path__: /var/log/anubis/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            msg: msg
            rule: rule
            action: action
      - labels:
          level:
          rule:
          action:

Elasticsearch

Use Filebeat or Fluentd to ship JSON logs to Elasticsearch.

Observability Best Practices

Metrics

✅ Do:

Monitor challenge/deny rates for attack detection
Set alerts for abnormal traffic patterns
Track per-rule metrics to optimize policy
Monitor Go runtime metrics (memory, goroutines)

🚫 Don’t:

Ignore sustained high deny rates (possible attack)
Set metrics scrape interval too low (< 15s)
Expose metrics endpoint to the internet

Logging

✅ Do:

Use structured logging (JSON) for easy parsing
Set appropriate log level (INFO for production)
Rotate log files to prevent disk space issues
Aggregate logs to centralized system

🚫 Don’t:

Use DEBUG level in production (too verbose)
Log to files without rotation
Disable logging entirely
Ignore error-level log messages

Health Checks

✅ Do:

Configure liveness and readiness probes
Use /healthz for automated monitoring
Set reasonable timeout/retry values
Monitor health check endpoint availability

🚫 Don’t:

Set health check interval too low (< 10s)
Use main application port for health checks
Ignore health check failures

Troubleshooting

Metrics Not Scraped

Symptom: Prometheus shows up{job="anubis"} == 0 Check:

# Verify metrics endpoint
curl http://localhost:9090/metrics

# Check if metrics server is bound
ss -tlnp | grep 9090

Fix: Ensure --metrics-bind is accessible from Prometheus.

High Memory Usage

Symptom: process_resident_memory_bytes growing unbounded Possible causes:

Memory storage backend without limits
DNS cache growing too large
Log file handles not closed

Fix: Switch to persistent storage backend (bbolt, valkey).

Missing Metrics

Symptom: No anubis_policy_results metrics Cause: No traffic matching policy rules Verify:

# Send test request
curl -A "Mozilla/5.0" http://localhost:8923/

# Check metrics
curl -s http://localhost:9090/metrics | grep anubis_policy_results

Next Steps

Troubleshooting - Debug common issues
Policy Configuration - Optimize rules for metrics
Bot Rules - Track specific rule performance

Get Started

Core Concepts

Installation

Administration

Deployment

Integrations

​Metrics Server

​Configuration

​Endpoints

​Prometheus Metrics

​Policy Results

​Available Metrics

​Health Checks

​HTTP Health Check

​CLI Health Check

​Integration Examples

​Docker

​Docker Compose

​Kubernetes

​Liveness Probe

​ServiceMonitor (Prometheus Operator)

​Prometheus

​Scrape Config

​Alert Rules

​Grafana Dashboard

​Request Rate by Action

​Top Rules

​Challenge Success Rate

​Logging

​Log Levels

​Log Sinks

​Standard Error (Default)

​File with Rotation

​Structured Log Fields

​Log Aggregation

​Loki (Grafana)

​Elasticsearch

​Observability Best Practices

​Metrics

​Logging

​Health Checks

​Troubleshooting

​Metrics Not Scraped

​High Memory Usage

​Missing Metrics

​Next Steps

Build docs developers (and LLMs) love

Metrics Server

Configuration

Endpoints

Prometheus Metrics

Policy Results

Available Metrics

Health Checks

HTTP Health Check

CLI Health Check

Integration Examples

Docker

Docker Compose

Kubernetes

Liveness Probe

ServiceMonitor (Prometheus Operator)

Prometheus

Scrape Config

Alert Rules

Grafana Dashboard

Request Rate by Action

Top Rules

Challenge Success Rate

Logging

Log Levels

Log Sinks

Standard Error (Default)

File with Rotation

Structured Log Fields

Log Aggregation

Loki (Grafana)

Elasticsearch

Observability Best Practices

Metrics

Logging

Health Checks

Troubleshooting

Metrics Not Scraped

High Memory Usage

Missing Metrics

Next Steps