Skip to main content

Overview

The Ubicloud control plane is a Ruby application that manages cloud infrastructure. It connects to PostgreSQL for persistence and communicates with bare metal servers via SSH.
The control plane runs two main processes: the web application (Puma/Roda) and the respirate background job processor.

Core Components

Database Layer: PostgreSQL + Sequel

All state is stored in a single PostgreSQL database. The control plane uses Sequel as the ORM.

Key Database Models

From model/*.rb:
# model/vm.rb - Virtual machine resource
class Vm < Sequel::Model
  one_to_one :strand, key: :id
  many_to_one :vm_host
  many_to_one :project
  one_to_many :nics
  # ...
end

# model/vm_host.rb - Bare metal host
class VmHost < Sequel::Model
  one_to_one :sshable, key: :id
  one_to_many :vms
  many_to_one :location
  # ...
end

# model/project.rb - Tenant isolation
class Project < Sequel::Model
  one_to_many :vms
  one_to_many :private_subnets
  many_to_many :accounts
  # ...
end
Sequel models are defined in model/*.rb and inherit from Sequel::Model. They use plugins for common behaviors like pagination, soft deletes, and resource methods.

Web Layer: Roda + Rodauth

The web console and API are served by Roda, a lightweight routing framework.

Request Flow

Authentication with Rodauth

From DEVELOPERS.md:
Web authentication is managed with Rodauth.
Rodauth provides:
  • Session management
  • Multi-factor authentication
  • Password reset flows
  • OAuth/OIDC integration
  • Account verification

SSH Communication Layer

The control plane communicates with data plane hosts using the net-ssh library.

Sshable Model

From model/sshable.rb:
class Sshable < Sequel::Model
  # Stores SSH connection info
  # - host: hostname or IP
  # - raw_private_key_1: Ed25519 private key
  # - unix_user: SSH username
  
  def cmd(command, stdin: nil, **kwargs)
    # Execute command via SSH
  end
  
  def write_file(path, content, user: :root)
    # Write file to remote host
  end
end
Example from prog/vm/host_nexus.rb:
# Reboot a host
last_boot_id = vm_host.last_boot_id
new_boot_id = sshable.cmd(
  "sudo host/bin/reboot-host :last_boot_id", 
  last_boot_id: last_boot_id
).strip

The Strand Model

The Strand is Ubicloud’s background job system. It’s a lightweight, database-driven orchestration engine that powers all asynchronous operations.
The Strand model eliminates the need for external message queues like RabbitMQ or Redis. Everything is coordinated through PostgreSQL.

What is a Strand?

A Strand represents a long-running workflow. From model/strand.rb:
class Strand < Sequel::Model
  # Core fields:
  # - id: unique identifier
  # - prog: program class (e.g., "Vm::HostNexus")
  # - label: current state/step (e.g., "wait", "bootstrap_rhizome")
  # - stack: JSONB array of frames (context data)
  # - schedule: when to run next
  # - lease: lease timestamp for distributed locking
  # - parent_id: optional parent strand
  # - exitval: exit value when complete
end

How Strands Work

1. Strand Creation

When a resource is created, a Strand is spawned:
# From prog/vm/nexus.rb
Strand.create(
  prog: "Vm::Metal::Nexus",
  label: "start",
  stack: [{
    "storage_volumes" => storage_volumes,
    "force_host_id" => force_host_id,
    "gpu_count" => gpu_count,
    # ... configuration ...
  }]
) { it.id = vm.id }

2. Lease-Based Execution

The respirate process picks up scheduled Strands using optimistic locking:
# From model/strand.rb
TAKE_LEASE_PS = DB[:strand]
  .where(
    id: :$id,
    exitval: nil,
    lease < CURRENT_TIMESTAMP
  )
  .update(
    lease: CURRENT_TIMESTAMP + '120 seconds',
    try: try + 1
  )
Strands use a lease-based locking system to ensure only one worker processes a Strand at a time. The lease expires after 120 seconds, allowing recovery from crashed workers.

3. Program Execution

Each Strand runs a Prog (program) class from prog/*.rb:
# From prog/base.rb
class Prog::Base
  def initialize(strand, snap = nil)
    @strand = strand
  end
  
  # Flow control methods:
  def hop(label)     # Jump to a different label
  def nap(seconds)   # Sleep and reschedule
  def pop(message)   # Exit current frame
  def push(prog)     # Call another program
  def bud(prog)      # Spawn child strand
end

4. State Machine Labels

Programs define labels (states) using label declarations:
# From prog/vm/host_nexus.rb
class Prog::Vm::HostNexus < Prog::Base
  label def start
    hop_setup_ssh_keys
  end
  
  label def setup_ssh_keys
    sshable.update(raw_private_key_1: SshKey.generate.keypair)
    hop_bootstrap_rhizome
  end
  
  label def bootstrap_rhizome
    hop_prep if retval&.dig("msg") == "rhizome user bootstrapped"
    push Prog::BootstrapRhizome, {"target_folder" => "host"}
  end
  
  label def prep
    bud Prog::Vm::PrepHost
    bud Prog::LearnNetwork
    bud Prog::LearnMemory
    hop_wait_prep
  end
  
  label def wait_prep
    reap(:setup_hugepages, reaper: ->(st) {
      # Process completed child strands
    })
  end
  
  label def wait
    when_reboot_set? { hop_prep_reboot }
    nap 6 * 60 * 60  # Sleep for 6 hours
  end
end

Flow Control Primitives

MethodPurposeExample
hop(label)Jump to a different label in the current programhop_wait
nap(seconds)Sleep and reschedule the Strandnap 30
pop(msg)Exit current stack framepop "vm created"
push(prog, frame)Call another program (adds to stack)push Prog::BootstrapRhizome
bud(prog, frame)Spawn a child Strandbud Prog::LearnCpu
reap(label)Wait for child Strands to completereap(:next_step)

Parent-Child Strands

Strands can spawn children using bud:
label def prep
  # Spawn multiple child strands in parallel
  bud Prog::LearnCpu
  bud Prog::LearnMemory
  bud Prog::LearnStorage
  hop_wait_prep
end

label def wait_prep
  # Reap (wait for) all children to complete
  reap(:setup_hugepages, reaper: ->(child) {
    case child.prog
    when "LearnCpu"
      vm_host.update(arch: child.exitval["arch"])
    when "LearnMemory"
      vm_host.update(total_mem_gib: child.exitval["mem_gib"])
    end
  })
end

Semaphores for Signaling

Strands can be signaled using semaphores:
# From model/vm_host.rb
plugin SemaphoreMethods, :checkup, :reboot, :destroy

# Trigger a reboot from anywhere:
vm_host.incr_reboot

# In the Strand:
label def wait
  when_reboot_set? do
    decr_reboot
    hop_prep_reboot
  end
  nap 6 * 60 * 60
end

Respirate: The Job Runner

The respirate process is a daemon that continuously picks up and executes Strands.

How It Works

From model/strand.rb:288:
def run(seconds = 0)
  take_lease_and_reload do
    loop do
      ret = unsynchronized_run  # Execute current label
      if ret.is_a?(Prog::Base::Nap) && ret.seconds != 0
        return ret  # Reschedule
      elsif ret.is_a?(Prog::Base::Exit)
        return ret  # Complete
      end
      # Otherwise, hop occurred - continue loop
    end
  end
end

Error Handling

Strands have built-in error handling and retry logic:
# From model/strand.rb
rescue => ex
  Clog.emit("exception terminates strand run", 
    Util.exception_to_hash(ex, into: {
      strand_error: {strand: ubid, try: try, duration: duration}
    })
  )
  raise RunError.new(self)
end
  • Exponential backoff on retries
  • Logging to centralized system
  • Deadline tracking for SLA monitoring

Configuration

Key environment variables (from config.rb):
CLOVER_DATABASE_URL=postgres:///clover_development?user=clover
CLOVER_SESSION_SECRET=<secret>
CLOVER_COLUMN_ENCRYPTION_KEY=<key>
RACK_ENV=development  # or 'production'
HETZNER_SSH_PRIVATE_KEY=<key>  # For cloudifying hosts

Development Workflow

Running Tests

# Run all tests
bundle exec rspec

# Run specific test
bundle exec rspec ./spec/model/strand_spec.rb:10

# With coverage
COVERAGE=1 bundle exec rspec

Debugging Strands

Use bin/pry to inspect Strands interactively:
# Find a Strand
st = Strand["strand-ubid"]

# Check its state
st.prog    # => "Vm::HostNexus"
st.label   # => "wait"
st.stack   # => [{"storage_volumes" => [...]}]

# Load and inspect the program
prg = st.load
prg.vm_host  # => VmHost instance

# Trigger a semaphore
st.subject.incr_reboot

Next Steps

Architecture Overview

Return to the high-level architecture documentation

Projects

Learn how Projects organize resources for multi-tenancy

Development Setup

Set up your environment to contribute to the control plane

Strand Tests

View Strand test examples on GitHub

Build docs developers (and LLMs) love