Skip to main content
Dokploy leverages Docker Swarm for container orchestration, enabling you to deploy and scale applications across multiple nodes in a cluster.

Swarm Architecture

Docker Swarm provides native clustering capabilities for Docker:

Manager Nodes

Control the cluster, maintain state, and schedule services across worker nodes.

Worker Nodes

Execute tasks and run containerized applications as assigned by managers.

Swarm Initialization

When setting up a deploy server, Dokploy automatically initializes Docker Swarm:
# Automatic IP detection and Swarm initialization
get_ip() {
  # Try IPv4 first
  ip=$(curl -4s --connect-timeout 5 https://ifconfig.io 2>/dev/null)
  
  # Fallback to IPv6 if needed
  if [ -z "$ip" ]; then
    ip=$(curl -6s --connect-timeout 5 https://ifconfig.io 2>/dev/null)
  fi
  
  echo "$ip"
}

advertise_addr=$(get_ip)
docker swarm init --advertise-addr $advertise_addr
Dokploy automatically detects whether to use IPv4 or IPv6 based on server network configuration.

Swarm Validation

Check if a server is already part of a Swarm:
if docker info | grep -q 'Swarm: active'; then
  echo "Already part of a Docker Swarm"
fi

Node Management

Viewing Cluster Nodes

Retrieve all nodes in the cluster:
const nodes = await getNodes(serverId);
Each node includes:
  • Node ID
  • Hostname
  • Role (manager/worker)
  • Availability (active/pause/drain)
  • Status (ready/down)
  • Manager status
  • Platform information

Adding Worker Nodes

1

Get Worker Join Token

Retrieve the join command with worker token:
const { command, version } = await addWorker(serverId);
// Returns: "docker swarm join --token SWMTKN-1-xxx... 192.168.1.100:2377"
2

Execute on New Node

Run the join command on the machine you want to add as a worker:
docker swarm join --token SWMTKN-1-xxx... 192.168.1.100:2377
3

Verify Node Joined

Check the node appears in the cluster:
docker node ls

Adding Manager Nodes

For high availability, add additional manager nodes:
1

Get Manager Join Token

const { command, version } = await addManager(serverId);
// Returns: "docker swarm join --token SWMTKN-1-yyy... 192.168.1.100:2377"
2

Join as Manager

Execute the command on the new manager node:
docker swarm join --token SWMTKN-1-yyy... 192.168.1.100:2377
Maintain an odd number of manager nodes (1, 3, 5, or 7) to ensure proper quorum for cluster decisions.

Removing Nodes

Safely remove nodes from the cluster:
await removeWorker(nodeId, serverId);
The removal process:
  1. Drains the node (stops accepting new tasks)
  2. Removes the node from the cluster
# Executed automatically
docker node update --availability drain <nodeId>
docker node rm <nodeId> --force
Draining ensures running tasks are gracefully migrated to other nodes before removal.

Service Deployment

Creating Swarm Services

Dokploy deploys applications as Docker Swarm services for orchestration:
const settings: CreateServiceOptions = {
  Name: appName,
  TaskTemplate: {
    ContainerSpec: {
      Image: imageName,
      Env: environmentVariables,
      Mounts: volumeMounts
    },
    Networks: [{ Target: "dokploy-network" }],
    Placement: {
      Constraints: ["node.role==manager"]
    }
  },
  Mode: {
    Replicated: {
      Replicas: 3
    }
  },
  EndpointSpec: {
    Ports: [{
      TargetPort: 3000,
      PublishedPort: 3000,
      PublishMode: "host",
      Protocol: "tcp"
    }]
  }
};

Placement Constraints

Control where services run using constraints:
Placement: {
  Constraints: ["node.role==manager"]
}

Service Scaling

Scale services across the cluster:
Mode: {
  Replicated: {
    Replicas: 5  // Run 5 instances across cluster
  }
}
Or use global mode to run one instance per node:
Mode: {
  Global: {}
}

Traefik with Swarm

Dokploy deploys Traefik as a Swarm service for load balancing:
const settings: CreateServiceOptions = {
  Name: "dokploy-traefik",
  TaskTemplate: {
    ContainerSpec: {
      Image: "traefik:v3.6.7",
      Mounts: [
        {
          Type: "bind",
          Source: "/etc/dokploy/traefik/traefik.yml",
          Target: "/etc/traefik/traefik.yml"
        },
        {
          Type: "bind",
          Source: "/var/run/docker.sock",
          Target: "/var/run/docker.sock"
        }
      ]
    },
    Networks: [{ Target: "dokploy-network" }],
    Placement: {
      Constraints: ["node.role==manager"]
    }
  },
  EndpointSpec: {
    Ports: [
      {
        TargetPort: 80,
        PublishedPort: 80,
        PublishMode: "host",
        Protocol: "tcp"
      },
      {
        TargetPort: 443,
        PublishedPort: 443,
        PublishMode: "host",
        Protocol: "tcp"
      },
      {
        TargetPort: 443,
        PublishedPort: 443,
        PublishMode: "host",
        Protocol: "udp"  // HTTP/3
      }
    ]
  }
};
Traefik runs on manager nodes and automatically discovers services across the entire Swarm cluster.

Networking

Overlay Network

Dokploy creates an overlay network for cross-node communication:
docker network create \
  --driver overlay \
  --attachable \
  dokploy-network
Key Features:
  • Overlay Driver: Enables multi-host networking
  • Attachable: Allows standalone containers to connect
  • Automatic Encryption: Optional encryption for sensitive data

Service Discovery

Services can communicate using DNS:
# Service name becomes DNS hostname
curl http://my-service:3000

# Tasks can be addressed individually
curl http://my-service.1:3000

Container Queries in Swarm

Service Containers

Retrieve containers for a Swarm service:
const containers = await getServiceContainersByAppName(
  appName,
  serverId
);

Stack Containers

Get all containers in a stack deployment:
const containers = await getStackContainersByAppName(
  appName,
  serverId
);

Application Labels

Query containers by Dokploy labels:
const containers = await getContainersByAppLabel(
  appName,
  "swarm",  // Specify swarm type
  serverId
);

Service Updates

Update running services without downtime:
const service = docker.getService(appName);
const inspect = await service.inspect();

await service.update({
  version: parseInt(inspect.Version.Index),
  ...newSettings,
  TaskTemplate: {
    ...newSettings.TaskTemplate,
    ForceUpdate: inspect.Spec.TaskTemplate.ForceUpdate + 1
  }
});
Incrementing ForceUpdate forces a rolling update even if configuration hasn’t changed.

Rolling Updates

Swarm automatically performs rolling updates:
  1. Stop old task on one node
  2. Start new task with updated configuration
  3. Wait for health check to pass
  4. Repeat for next node
This ensures zero-downtime deployments.

High Availability

Manager Quorum

Maintain cluster consensus with proper manager count:
ManagersFault ToleranceRecommended For
10Development
31Small Production
52Large Production
73Enterprise
Never use an even number of managers (2, 4, 6) as it doesn’t improve fault tolerance and can cause split-brain scenarios.

Service Restart Policies

Configure automatic restart behavior:
TaskTemplate: {
  RestartPolicy: {
    Condition: "on-failure",
    Delay: 5000000000,  // 5 seconds in nanoseconds
    MaxAttempts: 3
  }
}

Monitoring Swarm Services

Dokploy’s monitoring service tracks Swarm services:
metricsConfig: {
  containers: {
    refreshRate: 60,
    services: {
      include: ["my-swarm-service"],
      exclude: []
    }
  }
}
When monitoring Docker Compose or Swarm stacks, use the -p flag to properly identify all services within the stack.

Best Practices

  • Spread manager nodes across different availability zones
  • Use at least 3 manager nodes for production
  • Keep critical services on manager nodes
  • Distribute worker nodes for load balancing
  • Set resource reservations for critical services
  • Define resource limits to prevent overconsumption
  • Use node labels for heterogeneous clusters
  • Monitor node resource utilization
  • Use overlay networks for multi-host communication
  • Enable encryption for sensitive traffic
  • Implement proper firewall rules (port 2377)
  • Use host mode publishing for Traefik
  • Use rolling updates for zero-downtime deployments
  • Configure health checks for all services
  • Set appropriate restart policies
  • Implement placement constraints strategically
  • Regular backups of Swarm state on manager nodes
  • Document cluster topology
  • Test disaster recovery procedures
  • Maintain quorum during maintenance

Troubleshooting

Node Not Joining

Problem: Node fails to join the Swarm Solutions:
  • Verify port 2377 is open on manager nodes
  • Check network connectivity between nodes
  • Ensure token hasn’t expired
  • Verify Docker is running on both nodes
  • Check for firewall blocking connections

Service Not Starting

Problem: Service tasks fail to start Solutions:
  • Check service logs: docker service logs <service>
  • Verify image is accessible on all nodes
  • Check placement constraints are satisfiable
  • Ensure sufficient resources available
  • Review service configuration

Split-Brain Scenario

Problem: Manager nodes lose quorum Solutions:
  • Check network connectivity between managers
  • Verify time synchronization (NTP)
  • Ensure odd number of managers
  • Review manager node status
  • Force new cluster if necessary (last resort)

Service Not Updating

Problem: Service update doesn’t take effect Solutions:
  • Increment ForceUpdate counter
  • Check service version number
  • Verify no deployment conflicts
  • Review update logs
  • Try manual service removal and recreation

Network Issues

Problem: Services can’t communicate Solutions:
  • Verify overlay network exists: docker network ls
  • Check service network configuration
  • Ensure both services on same network
  • Test DNS resolution within containers
  • Review network firewall rules

Build docs developers (and LLMs) love