Skip to main content

Introduction

Rancher is a complete container management platform built for organizations that deploy containers in production. It provides a centralized authentication and access control system for managing multiple Kubernetes clusters from a single interface.

High-Level Architecture

Rancher follows a hub-and-spoke architecture pattern with these core concepts:

Management Cluster

The central Rancher server that runs in a Kubernetes cluster and manages all downstream clusters.

Downstream Clusters

Kubernetes clusters managed by Rancher, which can be imported, created, or hosted.

Core Architecture Components

Rancher Server

The Rancher server is the central management hub that consists of:
  • API Server: Multi-versioned API system (Norman v3 and Steve v1)
  • Authentication System: Pluggable authentication with support for multiple providers
  • Controller Manager: Reconciliation loops for managing cluster state
  • UI Server: Web-based management interface
  • Extension API Server: Kubernetes API aggregation for imperative operations
The main server process is defined in main.go:49 and initializes with the command: Complete container management platform

Cluster Management Model

Upstream vs Downstream

  • Runs the Rancher server components
  • Stores cluster configurations and state
  • Manages authentication and RBAC policies
  • Can optionally manage workloads when configured as “local” cluster
  • Namespace: cattle-system
  • Kubernetes clusters managed by Rancher
  • Run the Rancher agent for communication
  • Can be imported, provisioned, or hosted
  • Independent Kubernetes API servers

Communication Patterns

Tunnel Server Architecture

Rancher uses a WebSocket-based tunnel system for secure cluster communication:
The tunnel server is implemented in pkg/tunnelserver/ and uses the remotedialer library for bidirectional communication.

Agent-Server Communication

Key aspects of the communication model:
  1. Outbound Connections Only: Agents initiate connections to the Rancher server via WebSocket
  2. No Inbound Firewall Rules: Downstream clusters don’t need to expose ports
  3. TLS Encryption: All communication is encrypted using TLS
  4. Token Authentication: Service account tokens for authentication
  5. Peer Management: Multi-replica support with peer coordination
# Agent connects to Rancher at
wss://rancher.example.com/v3/connect

# Authentication via service account token
# TLS validation can be configured as:
# - strict: Validate using provided CA
# - system-store: Use system CA bundle

Multi-Cluster Manager (MCM)

The Multi-Cluster Manager is responsible for:
  • Cluster Registration: Managing cluster lifecycle and registration tokens
  • Proxy Routing: Proxying requests to downstream cluster APIs
  • Resource Aggregation: Collecting metrics and status from all clusters
  • RBAC Enforcement: Applying management-level access controls
MCM can be enabled/disabled via the MCM feature flag. When disabled, Rancher operates as an agent-only deployment.

High Availability Architecture

Server Replica Management

Rancher supports multiple replicas for high availability:
values.yaml:189
replicas: 3
priorityClassName: rancher-critical

Leader Election

  • Controllers use Kubernetes leader election
  • Only one replica runs reconciliation loops
  • Other replicas serve API requests
  • Peer coordination via endpoints monitoring

Load Distribution

1

Client Request

Requests arrive at the Rancher service endpoint
2

Load Balancer

Kubernetes service distributes to healthy replicas
3

Authentication

Each replica can authenticate and authorize requests
4

Proxy or Process

Requests are either processed locally or proxied to downstream clusters

Data Storage Architecture

Kubernetes API as Database

Rancher uses Kubernetes CRDs for persistent storage:
  • Cluster Definitions: management.cattle.io/v3 API group
  • User Configurations: RBAC rules, tokens, auth configs
  • Settings: Global and per-cluster settings
  • Catalog Data: Helm chart repositories and applications

SQL Cache (Optional)

For improved UI performance:
Environment Variables
# Enable SQL caching for Steve API
UI_SQL_CACHE: true
SQL_CACHE_GC_INTERVAL: "1h"
SQL_CACHE_GC_KEEP_COUNT: "1000"

Request Flow

Management API Request

Downstream Cluster API Request

Deployment Modes

Single-node development deployment:
docker run -d --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  --privileged \
  rancher/rancher:latest
  • Embedded Kubernetes mode
  • Automatic service/endpoint creation
  • Suitable for testing only

Networking Requirements

Rancher Server

PortProtocolPurpose
80HTTPRedirect to HTTPS
443HTTPSAPI and UI access
444HTTPSInternal aggregation API (optional)

Downstream Clusters

Downstream clusters only need outbound connectivity to the Rancher server. No inbound ports need to be opened.
  • Outbound HTTPS (443): For agent-server communication
  • Optional: Direct kubectl access to cluster API

Key Subsystems

Authentication

Multi-provider authentication system with SAML, OIDC, LDAP, and local auth

Controllers

Reconciliation loops managing cluster lifecycle, node drivers, and fleet

API Layers

Norman (v3) and Steve (v1) API systems with different paradigms

Provisioning

Cluster provisioning via RKE2, K3s, and hosted Kubernetes providers

Components

Deep dive into server components

Security

Security architecture and RBAC

API Reference

API documentation

Build docs developers (and LLMs) love