Skip to main content
Anubis exposes Prometheus metrics and health check endpoints for monitoring and observability.

Metrics Server

Anubis runs a separate metrics server on port 9090 by default.

Configuration

# Default: localhost:9090 on tcp
anubis --metrics-bind :9090

# Custom address
anubis --metrics-bind :8080

# Bind to specific interface
anubis --metrics-bind 192.168.1.10:9090

# Unix socket
anubis --metrics-bind unix:///var/run/anubis/metrics.sock \
       --metrics-bind-network unix

# Disable metrics
anubis --metrics-bind ""

Endpoints

PathDescription
/metricsPrometheus metrics (text format)
/healthzHealth check (HTTP 200 = OK)

Prometheus Metrics

Policy Results

Tracks how many requests matched each rule and what action was taken:
anubis_policy_results{rule="browsers",action="CHALLENGE"} 1234
anubis_policy_results{rule="googlebot",action="ALLOW"} 567
anubis_policy_results{rule="scrapers",action="DENY"} 89
Labels:
  • rule - Bot rule name from policy file
  • action - Action taken: ALLOW, DENY, CHALLENGE, WEIGH
Example queries:
# Total requests by action
sum by (action) (rate(anubis_policy_results[5m]))

# Top rules by request count
topk(10, sum by (rule) (rate(anubis_policy_results[5m])))

# Challenge rate
sum(rate(anubis_policy_results{action="CHALLENGE"}[5m]))

# Deny rate (potential attacks)
sum(rate(anubis_policy_results{action="DENY"}[5m]))

Available Metrics

Anubis exposes these metric families:
anubis_policy_results           # Policy rule matches (counter)
go_*                            # Go runtime metrics
process_*                       # Process metrics (CPU, memory, FDs)
promhttp_metric_handler_*       # Metrics endpoint stats

Health Checks

HTTP Health Check

# Check if Anubis is serving
curl http://localhost:9090/healthz

# Response:
# HTTP/1.1 200 OK
# OK
Status codes:
  • 200 OK - Anubis is serving traffic
  • 500 Internal Server Error - Anubis is not ready
  • 424 Failed Dependency - Unknown health state

CLI Health Check

Anubis includes a built-in health check command:
anubis --healthcheck

# Exit codes:
# 0 = healthy
# 1 = unhealthy or error
Use cases:
  • Docker HEALTHCHECK
  • Kubernetes liveness/readiness probes
  • Systemd watchdog

Integration Examples

Docker

FROM ghcr.io/techarohq/anubis:latest

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD ["/usr/local/bin/anubis", "--healthcheck"]

Docker Compose

services:
  anubis:
    image: ghcr.io/techarohq/anubis:latest
    healthcheck:
      test: ["/usr/local/bin/anubis", "--healthcheck"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 5s

Kubernetes

Liveness Probe

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anubis
spec:
  template:
    spec:
      containers:
      - name: anubis
        image: ghcr.io/techarohq/anubis:latest
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9090
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /healthz
            port: 9090
          initialDelaySeconds: 5
          periodSeconds: 10

ServiceMonitor (Prometheus Operator)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: anubis
spec:
  selector:
    matchLabels:
      app: anubis
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
apiVersion: v1
kind: Service
metadata:
  name: anubis-metrics
  labels:
    app: anubis
spec:
  selector:
    app: anubis
  ports:
  - name: metrics
    port: 9090
    targetPort: 9090

Prometheus

Scrape Config

scrape_configs:
  - job_name: 'anubis'
    static_configs:
      - targets: ['anubis.example.com:9090']
    scrape_interval: 15s

Alert Rules

groups:
  - name: anubis
    rules:
      # High challenge rate (possible attack)
      - alert: AnubisHighChallengeRate
        expr: |
          sum(rate(anubis_policy_results{action="CHALLENGE"}[5m])) > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High Anubis challenge rate"
          description: "Challenge rate is {{ $value }}/sec (threshold: 100/sec)"

      # High deny rate (active attack)
      - alert: AnubisHighDenyRate
        expr: |
          sum(rate(anubis_policy_results{action="DENY"}[5m])) > 50
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High Anubis deny rate"
          description: "Deny rate is {{ $value }}/sec (possible attack)"

      # Anubis not serving
      - alert: AnubisDown
        expr: up{job="anubis"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Anubis is down"
          description: "Anubis instance {{ $labels.instance }} is not responding"

      # Low allow rate (too strict)
      - alert: AnubisLowAllowRate
        expr: |
          sum(rate(anubis_policy_results{action="ALLOW"}[10m])) 
          / sum(rate(anubis_policy_results[10m])) < 0.5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Anubis allow rate is low"
          description: "Only {{ $value | humanizePercentage }} of traffic is allowed"

Grafana Dashboard

Example dashboard panels:

Request Rate by Action

sum by (action) (rate(anubis_policy_results[5m]))
Visualization: Time series (stacked area)

Top Rules

topk(10, sum by (rule) (
  increase(anubis_policy_results[1h])
))
Visualization: Bar gauge

Challenge Success Rate

Requires application-level instrumentation (not built-in).

Logging

Anubis uses structured logging (JSON format) with configurable levels.

Log Levels

# Default: INFO
anubis --slog-level INFO

# Options: DEBUG, INFO, WARN, ERROR
anubis --slog-level DEBUG

# Fine-grained levels
anubis --slog-level "INFO+1"
Or via policy file:
logging:
  sink: stdio
  level: INFO

Log Sinks

Standard Error (Default)

logging:
  sink: stdio
Logs to stderr. Captured by Docker, Kubernetes, systemd.

File with Rotation

logging:
  sink: file
  parameters:
    file: /var/log/anubis/anubis.log
    maxBackups: 3
    maxBytes: 67108864  # 64 MiB
    maxAge: 7           # days
    compress: true
    useLocalTime: false
Old logs are compressed: anubis.log.2026-03-03T12:00:00Z.gz

Structured Log Fields

{
  "time": "2026-03-03T12:34:56Z",
  "level": "INFO",
  "msg": "request processed",
  "rule": "browsers",
  "action": "CHALLENGE",
  "remote_addr": "1.2.3.4",
  "user_agent": "Mozilla/5.0...",
  "path": "/api/users",
  "subsystem": "anubis"
}
Key fields:
  • rule - Matched rule name
  • action - Action taken
  • remote_addr - Client IP
  • subsystem - Component (anubis, metrics, config-validate)

Log Aggregation

Loki (Grafana)

# promtail.yml
clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: anubis
    static_configs:
      - targets:
          - localhost
        labels:
          job: anubis
          __path__: /var/log/anubis/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            msg: msg
            rule: rule
            action: action
      - labels:
          level:
          rule:
          action:

Elasticsearch

Use Filebeat or Fluentd to ship JSON logs to Elasticsearch.

Observability Best Practices

Metrics

Do:
  • Monitor challenge/deny rates for attack detection
  • Set alerts for abnormal traffic patterns
  • Track per-rule metrics to optimize policy
  • Monitor Go runtime metrics (memory, goroutines)
🚫 Don’t:
  • Ignore sustained high deny rates (possible attack)
  • Set metrics scrape interval too low (< 15s)
  • Expose metrics endpoint to the internet

Logging

Do:
  • Use structured logging (JSON) for easy parsing
  • Set appropriate log level (INFO for production)
  • Rotate log files to prevent disk space issues
  • Aggregate logs to centralized system
🚫 Don’t:
  • Use DEBUG level in production (too verbose)
  • Log to files without rotation
  • Disable logging entirely
  • Ignore error-level log messages

Health Checks

Do:
  • Configure liveness and readiness probes
  • Use /healthz for automated monitoring
  • Set reasonable timeout/retry values
  • Monitor health check endpoint availability
🚫 Don’t:
  • Set health check interval too low (< 10s)
  • Use main application port for health checks
  • Ignore health check failures

Troubleshooting

Metrics Not Scraped

Symptom: Prometheus shows up{job="anubis"} == 0 Check:
# Verify metrics endpoint
curl http://localhost:9090/metrics

# Check if metrics server is bound
ss -tlnp | grep 9090
Fix: Ensure --metrics-bind is accessible from Prometheus.

High Memory Usage

Symptom: process_resident_memory_bytes growing unbounded Possible causes:
  • Memory storage backend without limits
  • DNS cache growing too large
  • Log file handles not closed
Fix: Switch to persistent storage backend (bbolt, valkey).

Missing Metrics

Symptom: No anubis_policy_results metrics Cause: No traffic matching policy rules Verify:
# Send test request
curl -A "Mozilla/5.0" http://localhost:8923/

# Check metrics
curl -s http://localhost:9090/metrics | grep anubis_policy_results

Next Steps

Build docs developers (and LLMs) love