Swarm Architecture
Docker Swarm provides native clustering capabilities for Docker:Manager Nodes
Control the cluster, maintain state, and schedule services across worker nodes.
Worker Nodes
Execute tasks and run containerized applications as assigned by managers.
Swarm Initialization
When setting up a deploy server, Dokploy automatically initializes Docker Swarm:Dokploy automatically detects whether to use IPv4 or IPv6 based on server network configuration.
Swarm Validation
Check if a server is already part of a Swarm:Node Management
Viewing Cluster Nodes
Retrieve all nodes in the cluster:- Node ID
- Hostname
- Role (manager/worker)
- Availability (active/pause/drain)
- Status (ready/down)
- Manager status
- Platform information
Adding Worker Nodes
Adding Manager Nodes
For high availability, add additional manager nodes:Removing Nodes
Safely remove nodes from the cluster:- Drains the node (stops accepting new tasks)
- Removes the node from the cluster
Draining ensures running tasks are gracefully migrated to other nodes before removal.
Service Deployment
Creating Swarm Services
Dokploy deploys applications as Docker Swarm services for orchestration:Placement Constraints
Control where services run using constraints:Service Scaling
Scale services across the cluster:Traefik with Swarm
Dokploy deploys Traefik as a Swarm service for load balancing:Traefik runs on manager nodes and automatically discovers services across the entire Swarm cluster.
Networking
Overlay Network
Dokploy creates an overlay network for cross-node communication:- Overlay Driver: Enables multi-host networking
- Attachable: Allows standalone containers to connect
- Automatic Encryption: Optional encryption for sensitive data
Service Discovery
Services can communicate using DNS:Container Queries in Swarm
Service Containers
Retrieve containers for a Swarm service:Stack Containers
Get all containers in a stack deployment:Application Labels
Query containers by Dokploy labels:Service Updates
Update running services without downtime:Incrementing
ForceUpdate forces a rolling update even if configuration hasn’t changed.Rolling Updates
Swarm automatically performs rolling updates:- Stop old task on one node
- Start new task with updated configuration
- Wait for health check to pass
- Repeat for next node
High Availability
Manager Quorum
Maintain cluster consensus with proper manager count:| Managers | Fault Tolerance | Recommended For |
|---|---|---|
| 1 | 0 | Development |
| 3 | 1 | Small Production |
| 5 | 2 | Large Production |
| 7 | 3 | Enterprise |
Service Restart Policies
Configure automatic restart behavior:Monitoring Swarm Services
Dokploy’s monitoring service tracks Swarm services:When monitoring Docker Compose or Swarm stacks, use the
-p flag to properly identify all services within the stack.Best Practices
Node Distribution
Node Distribution
- Spread manager nodes across different availability zones
- Use at least 3 manager nodes for production
- Keep critical services on manager nodes
- Distribute worker nodes for load balancing
Resource Allocation
Resource Allocation
- Set resource reservations for critical services
- Define resource limits to prevent overconsumption
- Use node labels for heterogeneous clusters
- Monitor node resource utilization
Network Design
Network Design
- Use overlay networks for multi-host communication
- Enable encryption for sensitive traffic
- Implement proper firewall rules (port 2377)
- Use host mode publishing for Traefik
Service Configuration
Service Configuration
- Use rolling updates for zero-downtime deployments
- Configure health checks for all services
- Set appropriate restart policies
- Implement placement constraints strategically
Backup and Recovery
Backup and Recovery
- Regular backups of Swarm state on manager nodes
- Document cluster topology
- Test disaster recovery procedures
- Maintain quorum during maintenance
Troubleshooting
Node Not Joining
Problem: Node fails to join the Swarm Solutions:- Verify port 2377 is open on manager nodes
- Check network connectivity between nodes
- Ensure token hasn’t expired
- Verify Docker is running on both nodes
- Check for firewall blocking connections
Service Not Starting
Problem: Service tasks fail to start Solutions:- Check service logs:
docker service logs <service> - Verify image is accessible on all nodes
- Check placement constraints are satisfiable
- Ensure sufficient resources available
- Review service configuration
Split-Brain Scenario
Problem: Manager nodes lose quorum Solutions:- Check network connectivity between managers
- Verify time synchronization (NTP)
- Ensure odd number of managers
- Review manager node status
- Force new cluster if necessary (last resort)
Service Not Updating
Problem: Service update doesn’t take effect Solutions:- Increment
ForceUpdatecounter - Check service version number
- Verify no deployment conflicts
- Review update logs
- Try manual service removal and recreation
Network Issues
Problem: Services can’t communicate Solutions:- Verify overlay network exists:
docker network ls - Check service network configuration
- Ensure both services on same network
- Test DNS resolution within containers
- Review network firewall rules