High Availability Overview
Gitaly Cluster is an active-active cluster configuration that provides high availability for Git repository storage. It ensures resilient Git operations by replicating repository data across multiple Gitaly nodes.Architecture
The high-level design uses a reverse proxy approach to distribute requests across multiple storage nodes:Key Components
Praefect acts as a transparent front end to all Gitaly nodes:- Routes gRPC calls to the correct storage shard
- Ensures write operations are performed transactionally when needed
- Coordinates replication across multiple Gitaly nodes
- Handles failover when primary nodes become unavailable
- Store repository data on local disk
- Execute Git commands and RPC operations
- Operate independently without knowledge of the cluster topology
- Primary node assignments for each repository
- Replication job queue
- Node health information
- Repository metadata
The minimum supported PostgreSQL version is 9.6, consistent with the rest of GitLab.
Terminology
Shard: A logical partition of storage for repositories. Each shard requires multiple Gitaly nodes (at least 3 for optimal availability) to maintain high availability. Virtual Storage: A cluster of Gitaly nodes that appear as a single storage to clients. Praefect manages routing requests to the appropriate nodes within a virtual storage. Primary Node: The authoritative Gitaly node for a repository. Write operations are directed to the primary, which then coordinates replication to secondary nodes. Secondary Nodes: Replica nodes that maintain copies of repository data. They serve read requests and can be promoted to primary during failover. Replication: The process of synchronizing repository changes from the primary node to secondary nodes to maintain consistency across the cluster.Consistency Models
Gitaly Cluster supports two consistency models:Strong Consistency
With strong consistency, all Gitaly nodes must agree before changes are committed:- Uses Git’s reference-transaction hook to coordinate writes
- All nodes vote on each reference update
- Changes are only committed if quorum is reached
- Provides immediate consistency guarantees
- Default mode for transaction-aware RPCs
Strong consistency requires Git version 2.28.0 or newer.
Eventual Consistency
With eventual consistency, writes complete on the primary and replicate asynchronously:- Primary node accepts the write immediately
- Replication jobs scheduled to update secondaries
- Secondaries may lag behind primary temporarily
- Used for non-transactional operations
- Similar to how Geo replication works
Benefits of High Availability
Fault Tolerance: Gitaly Cluster continues operating even if individual nodes fail. Automatic failover promotes a healthy secondary to primary when needed. Horizontal Scaling: Distribute read load across multiple nodes. Add more nodes to a virtual storage to increase capacity. Data Redundancy: Multiple copies of each repository protect against data loss from hardware failures. Zero Downtime: Perform maintenance on individual nodes without interrupting service.Comparison to Geo
While both Gitaly Cluster and Geo involve replication, they serve different purposes:| Feature | Gitaly Cluster | Geo |
|---|---|---|
| Primary Goal | High availability | Disaster recovery |
| Consistency | Strong or eventual | Eventual only |
| Failover | Automatic | Manual coordination |
| Scope | Single datacenter | Multiple datacenters |
| Data Replicated | Git repositories only | All GitLab data |
| Latency Impact | Low (same datacenter) | Higher (geographic distance) |
Gitaly Cluster handles failure of individual Gitaly nodes, while Geo handles failure of entire datacenters.
Next Steps
Configure Praefect
Set up Praefect and configure virtual storage
Replication
Learn how replication works in Gitaly Cluster
Failover
Configure automatic failover and recovery