Quick Diagnostics
Start with these quick checks when troubleshooting:Common Issues
Node Startup Issues
Node fails to start - Configuration errors
Node fails to start - Configuration errors
Symptoms:Solutions:
- Node exits immediately on startup
- Fatal error logs about configuration
- Port already in use errors
- Validate your configuration file:
- Check for port conflicts:
- Verify beacon node connectivity:
- Review required configuration fields:
eth2.BeaconNodeAddrOperatorPrivateKeyorKeyStore- Valid network configuration (ports, discovery)
Ensure your beacon node is fully synced before starting SSV Node.
Database initialization failures
Database initialization failures
Symptoms:Solutions:
- Error logs about database access
- Permission denied errors
- Corrupted database errors
- Check directory permissions:
- If database is corrupted, restore from backup:
- Verify sufficient disk space:
Operator key issues
Operator key issues
Symptoms:
- Authentication failures
- Unable to participate in duties
- “Invalid operator key” errors
- Verify operator key format:
- Re-generate operator keys if needed:
- Ensure operator is registered on the SSV contract:
Validator Issues
Validator not participating in duties
Validator not participating in duties
Symptoms:Log examples:Solutions:
- Validator shows as
not_participatingorno_index - No attestations or proposals being submitted
- Missing validator duties
- Verify validator is registered:
- Check validator shares are loaded:
- Verify validator activation:
- Check beacon chain validator status
- Ensure sufficient balance (32 ETH)
- Verify activation queue position
- Check operator cluster membership:
- Ensure operator is part of validator’s cluster
- Verify cluster has minimum operators (4/10/13 scheme)
- Check operator IDs match cluster configuration
Failed duty submissions
Failed duty submissions
Symptoms:Log examples:Solutions:
- Increasing
ssv_runner_submissions_failedcounter - Missed attestations or proposals
- Error logs about submission failures
- Check beacon node connectivity:
- Verify beacon node is synced:
- Check for slashing protection issues:
- Review slashing protection database
- Ensure no duplicate validator instances
- Check for clock synchronization issues
- Monitor consensus duration:
Validator slashing
Validator slashing
Symptoms:
ssv_validator_validators_per_status{status="slashed"}> 0- Validator balance decreasing dramatically
- Slashing event on beacon chain
- STOP THE NODE IMMEDIATELY:
- Investigate the cause:
- Check if multiple instances were running
- Review logs for double signing evidence
- Check system clock synchronization
- Verify slashing protection database integrity
- DO NOT RESTART until root cause is identified and resolved
- Common causes:
- Running duplicate validator instances
- Clock drift causing timing issues
- Corrupted slashing protection database
- Restored old database state
- Never run multiple instances of the same validator
- Maintain accurate system time (use NTP)
- Regular database backups
- Proper shutdown procedures before migrations
- Test failover procedures in testnet first
Network and P2P Issues
No peers connected
No peers connected
Symptoms:Solutions:
- Zero or very few peers connected
- Unable to participate in consensus
- Isolated from network
- Verify P2P ports are open:
- Check firewall rules:
- Verify NAT configuration:
- Configure port forwarding for 12001/udp and 13001/tcp
- Set correct external IP in config if behind NAT
- Check bootnode connectivity:
High message drop rate
High message drop rate
Symptoms:Metric checks:Solutions:
- Logs showing “subscriber channel full, dropping the message”
- Delayed consensus
- Performance degradation
- Check system resources:
- Reduce log verbosity if using debug level:
- Optimize database performance:
- Ensure SSD storage for database
- Check disk I/O wait times
- Consider BadgerDB tuning parameters
- Scale hardware resources:
- Increase CPU cores
- Add more RAM
- Use faster storage (NVMe)
Performance Issues
Slow consensus / High round changes
Slow consensus / High round changes
Symptoms:Solutions:
- High
ssv_qbft_rounds_changedcounter - Consensus taking >2 seconds
- Frequent round timeouts
- Check network latency to other operators:
- Verify system clock synchronization:
- Check for underperforming cluster members:
- Review operator performance in cluster
- Consider replacing slow/unreliable operators
- Verify all operators are online
- Monitor duty timing:
High memory or CPU usage
High memory or CPU usage
Symptoms:Solutions:
- Node consuming excessive resources
- System becoming unresponsive
- OOM killer terminating node
- Check for memory leaks:
- Review heap profile for growing allocations
- Monitor memory over time
- Report findings to SSV team if leak detected
- Reduce load:
- Optimize database:
- Hardware recommendations:
- Minimum: 4 CPU cores, 8GB RAM
- Recommended: 8+ CPU cores, 16GB+ RAM
- Storage: SSD/NVMe with 100GB+ free space
Beacon Node Integration Issues
Beacon node connection failures
Beacon node connection failures
Symptoms:Solutions:
- Repeated connection errors in logs
- Unable to fetch duties
- No duty execution
- Verify beacon node is running and accessible:
- Check network connectivity:
- Verify beacon node is synced:
- Configure multiple beacon nodes for redundancy:
Chain reorg handling issues
Chain reorg handling issues
Symptoms:Analysis:
- Frequent reorg event logs
- Duty execution errors after reorgs
- Inconsistent state
- Check reorg frequency and depth:
- Shallow reorgs (1-2 blocks) are normal
- Deep reorgs (>3 blocks) indicate beacon chain issues
- Ensure beacon node is well-connected to network
- Verify beacon node is following correct chain
- Check beacon node peers and sync status
- Monitor beacon chain health (external tools)
Debugging Tools
Health Check Endpoint
Metrics Inspection
Log Analysis Tools
Database Inspection
Getting Help
Information to Collect
When seeking help, gather:-
Node Information:
-
Configuration (sanitized - remove private keys):
-
Recent Logs:
-
Metrics Snapshot:
-
System Info:
Support Channels
Discord Community
Join the SSV community for real-time support and discussions
GitHub Issues
Report bugs and track known issues
Documentation
Review comprehensive documentation and guides
API Reference
Explore API endpoints for monitoring and management
Preventive Maintenance
Regular Checks
- Daily: Review error logs and metrics dashboards
- Weekly: Check disk space, database size, system updates
- Monthly: Review performance trends, backup verification, security updates
Monitoring Best Practices
-
Set up alerts for:
- Node health check failures
- High failed submission rates
- Validator status changes
- Resource exhaustion (disk, memory)
-
Maintain backups:
- Database backups (before upgrades)
- Configuration backups
- Operator key backups (encrypted, secure storage)
-
Keep software updated:
- Monitor SSV releases
- Test updates in testnet first
- Follow upgrade procedures carefully
Next Steps
Metrics Setup
Configure Prometheus and Grafana for comprehensive monitoring
Logging Configuration
Optimize logging for better debugging and analysis
