Metrics Endpoint
The metrics server runs on a separate port from the main node API and exposes the following endpoints:/metrics
Prometheus-formatted metrics endpoint with OpenMetrics support.
The default metrics port is
15000. Configure this in your config.yaml with the MetricsAPIPort setting./health
Health check endpoint that validates node operational status.
- 200 OK: Node is healthy
- 500 Internal Server Error: Node health check failed with error details in JSON format
/debug/pprof/*
Go profiling endpoints (enabled with EnableProfile: true).
Available endpoints:
/debug/pprof/- Index of available profiles/debug/pprof/heap- Memory allocation profiling/debug/pprof/goroutine- Goroutine stack traces/debug/pprof/threadcreate- Thread creation profiling/debug/pprof/block- Blocking profiling/debug/pprof/mutex- Mutex contention profiling/debug/pprof/profile- CPU profiling (30s default)
/database/count-by-collection
Query database key counts by collection prefix.
Key Metrics Categories
Validator Metrics
Metrics tracking validator status and lifecycle (source:operator/validator/observability.go:39).
ssv.validator.validators.per_status
- Type: Gauge
- Description: Total number of validators by status
- Labels:
ssv.validator.status - Status values:
active- Validator is active on beacon chainattesting- Validator is performing attestation dutiesparticipating- Validator is participating in SSV clusterpending- Validator activation pendingexiting- Validator is exitingslashed- Validator has been slashednot_activated- Validator not yet activatedno_index- Validator index not foundnot_found- Validator not found on beacon chainunknown- Status could not be determined
ssv.validator.validators.removed
- Type: Counter
- Unit:
{validator} - Description: Total number of validators removed from the node
ssv.validator.errors
- Type: Counter
- Unit:
{validator} - Description: Total number of validator-related errors
Runner Metrics
Metrics for duty execution performance (source:protocol/v2/ssv/runner/observability.go:83).
ssv.runner.consensus.duration
- Type: Histogram
- Unit: seconds
- Buckets:
[0, 0.001, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10] - Description: Time spent in consensus phase
- Labels:
ssv.runner.role(attester, proposer, aggregator, sync_committee, sync_committee_contribution)
ssv.runner.pre_consensus.duration
- Type: Histogram
- Unit: seconds
- Description: Time spent in pre-consensus phase (signature collection)
- Labels:
ssv.runner.role
ssv.runner.post_consensus.duration
- Type: Histogram
- Unit: seconds
- Description: Time spent in post-consensus phase (reconstruction and submission)
- Labels:
ssv.runner.role
ssv.runner.duty.duration
- Type: Histogram
- Unit: seconds
- Description: Total duty execution time from start to completion
- Labels:
ssv.runner.role,ssv.duty.round
ssv.runner.submissions
- Type: Gauge
- Unit:
{submission} - Description: Number of duty submissions per epoch by role
- Labels:
ssv.beacon.role
ssv.runner.submissions.failed
- Type: Counter
- Unit:
{submission} - Description: Total number of failed duty submissions
- Labels:
ssv.beacon.role
QBFT Instance Metrics
Metrics for consensus instance performance (source:protocol/v2/qbft/instance/observability.go:41).
ssv.qbft.validator_stage.duration
- Type: Histogram
- Unit: seconds
- Description: Time validators spend in different consensus stages
- Buckets: Same as runner duration metrics
ssv.qbft.rounds.changed
- Type: Counter
- Description: Number of consensus round changes (indicates slower consensus)
Duty Scheduling Metrics
Metrics for duty scheduling and slot timing (source:operator/duties/observability.go:24).
ssv.scheduler.slot_delay
- Type: Histogram
- Unit: seconds
- Description: Delay between slot start time and duty processing
- Buckets: Same as runner duration metrics
ssv.scheduler.duties.scheduled
- Type: Counter
- Description: Total number of duties scheduled
- Labels:
ssv.runner.role
Queue Metrics
Metrics for duty queue sizes (source:protocol/v2/ssv/queue/observability.go:28).
ssv.queue.inbox.size
- Type: Gauge
- Description: Current size of duty queue inbox
- Labels:
queue_type,queue_id
Duty Tracer Metrics
Metrics for duty tracing and message tracking (source:operator/dutytracer/observability.go:20).
ssv.tracer.in_flight_messages
- Type: Counter
- Description: Number of messages being tracked in the duty tracer
ssv.tracer.processing.duration
- Type: Histogram
- Unit: seconds
- Description: Time spent processing traced messages
ssv.tracer.db.duration
- Type: Histogram
- Unit: seconds
- Description: Database operation duration for duty tracer storage
Grafana Dashboard Setup
SSV Node includes Grafana dashboards for comprehensive monitoring.Available Dashboards
According to the roadmap (source:ROADMAP.md:118), SSV provides:
- V2 Grafana Dashboards for node health and performance monitoring
- Prometheus and Grafana support for production deployments
Prometheus Configuration
Add SSV Node as a scrape target in yourprometheus.yml:
prometheus.yml
Key Queries for Alerts
Validator Status Monitoring
Performance Monitoring
Duty Execution Monitoring
Monitoring Best Practices
Set Up Alerts
Configure Prometheus alerts for:
- Failed submissions
- Validator status changes
- High consensus durations
- Round changes
Track Trends
Monitor historical trends:
- Submission success rates
- Consensus performance over time
- Queue sizes and backlogs
Correlate Events
Cross-reference metrics with:
- Beacon chain events
- Network connectivity
- System resource usage
Regular Review
Periodically review:
- Dashboard accuracy
- Alert thresholds
- Metric retention policies
Recommended Alert Rules
alerting_rules.yml
Next Steps
Logging Configuration
Configure structured logging and log analysis
Troubleshooting Guide
Common issues and debugging techniques
