Why scalability matters
For realtime platforms, scalability is not a nice-to-have feature but a fundamental requirement, as applications can experience unpredictable growth and traffic spikes.Consequences of poor scalability
When systems cannot scale effectively, users experience:- Degraded service
- Increased latency
- Connection failures
- Message loss during periods of high demand
Ably’s scalability dimensions
Ably scales on several key dimensions:- Channels: The maximum number of channels that can be used simultaneously by a single application can be scaled horizontally with no technical limit
- Connections: The maximum number of connections that can exist simultaneously for a single application can be scaled horizontally without limitation
- Message throughput: The total volume of messages Ably can process for a single application at any given moment is scaled horizontally, meaning there is no limit on the maximum possible aggregate rate
Vertical vs. horizontal scalability
There are fundamentally two approaches to scaling systems.Vertical scalability (scaling up)
Vertical scalability means tackling larger problems by using larger components:- Deploying server instances with more CPU cores
- Adding more memory to existing servers
- Using larger storage devices
- Physical constraints on how powerful a single machine can become
- Costs increase non-linearly with capacity
- Single points of failure become more critical
- Downtime is often required for upgrades
Horizontal scalability (scaling out)
Horizontal scalability means solving larger problems by having more components instead of larger ones:- Adding more server instances to a distributed system
- Distributing load across a larger number of machines
- Partitioning data and workloads across multiple resources
- Virtually unlimited scaling potential
- Greater resilience through redundancy
- More cost-effective scaling at large scales
- Ability to scale both up and down with demand (elasticity)
- Resources can be added incrementally as needed
The need for elasticity
Modern applications require elasticity — the ability to scale both up and down in response to changing demand:- Applications experience fluctuations in usage
- Traffic patterns follow time zones and regional events
- Cost optimization demands that resources match current needs
- Successful applications can experience exponential growth that is impossible to predict accurately
- Persistent connections
- Message fanout
- Low latency requirements
- Global synchronization
Challenges of horizontally scaling resources
Effective horizontal scaling involves several significant challenges that must be addressed in a distributed system design.Resource coordination
It’s not enough just to have unlimited resources — you have to direct requests to those resources effectively:- Resources need coordinated access to shared dependencies
- Requests must be distributed efficiently across available resources
- The system must maintain consistent behavior as resources are added or removed
Stateful interactions
Most realtime systems involve stateful interactions where the replicated resources cannot operate independently:- Users expect consistent experiences across sessions
- Messages need to be delivered in the correct order
- Subscribers to the same channel need to see the same content
State maintenance
In stateful systems like Ably, the system must maintain information across requests or messages:- Which clients are connected
- Which channels are active
- Which clients are subscribed to which channels
- Message ordering and history
- Presence information
High-scale fanout
This occurs when a single message needs to be delivered to a very large number of recipients:- A sporting event might have millions of viewers receiving the same score updates
- A financial application might distribute price changes to thousands of traders
- A chat application might deliver a message to everyone in a popular channel
Consistent hashing for workload distribution
Ably uses consistent hashing as the foundation of its horizontal scaling approach. Consistent hashing solves a key problem in distributed systems: how to distribute work evenly across a changing set of resources while minimizing redistribution when resources are added or removed.Traditional hashing limitations
In a traditional hashing approach, work might be distributed using a formula like:- When the number of servers changes, the output of the calculation changes for most items
- This results in most items being moved to new servers
- Massive redistribution causes service disruption
How consistent hashing works
Consistent hashing addresses this issue by arranging both servers and work items on a conceptual “ring”:Assign to nearest server
Each work item is assigned to the nearest server clockwise around the ring.
Ably’s implementation
At Ably, consistent hashing enables efficient distribution of channels across the available processing resources:- Each channel is placed on a specific server instance based on its position on the hash ring
- When the system scales — either to add capacity or to respond to failures — only a small fraction of channels need to be moved to different servers
- This approach minimizes disruption during scaling events and ensures that the system can maintain performance even as it grows
Multiple hashes for even distribution
To address potential uneven distribution, especially when the number of servers is small, Ably assigns multiple hash positions to each possible placement location (server process):- Each placement location is represented by multiple points on the hash ring
- The number of points can be adjusted based on server capacity
- This statistically creates a more even distribution of work
- Each existing server will give up approximately 1/11th of its load to the new server
- This results in a balanced distribution
- Busy items are distributed more evenly across the available servers
- The law of large numbers helps ensure that no single placement location gets an unfair share of high-traffic items
- The system becomes more statistically predictable as scale increases
Progressive hashing for graceful scaling
Ably extends consistent hashing with “progressive hashing” to make scaling operations more gradual and controlled. Problem: Even with consistent hashing, adding or removing a server causes an immediate redistribution of the affected work items, which can lead to:- Thundering herd problems
- Connection spikes
- Processing delays
- Resource pressure
Scaling up with progressive hashing
When a new server joins the cluster:Gradual hash announcement
The server doesn’t immediately take on its full share of work. Instead, it gradually announces additional hash positions over time.
Progressive load absorption
The new server might start by claiming just 10% of its eventual hash positions, then increase to 20%, 30%, and so on.
Scaling down with progressive hashing
The same approach works in reverse when a server is scheduled for termination:Gradual relinquishment
The server gradually relinquishes its hash positions before actual termination.
How Ably achieves scalability
Ably’s architecture is built from the ground up to enable horizontal scalability across all dimensions. This is achieved through several key design principles that work together to create a seamlessly scalable platform.Multi-layered architecture
Ably uses a multi-layered architecture organized into independently scalable layers:- Frontend layer: Handles REST requests and realtime connections (WebSocket and Comet)
- Core layer: Performs central message processing for channels
Channel scalability
Channels are the core building block of Ably’s service. Ably achieves horizontal scalability for channels through consistent hashing:- Each compute instance within the core layer has a set of pseudo-randomly generated hashes
- Hashing determines the location of any given channel
- As a cluster scales, channels relocate to maintain an even load distribution
- Any number of channels can exist as long as sufficient compute capacity is available
Connection scalability
Connection processing is stateless, meaning connections can be freely routed to any frontend server without impacting functionality:- A load balancer distributes work and decides where to terminate each connection
- Combines simple random allocation with prioritization based on instantaneous load factors
- The system performs background shedding to force the relocation of connections for balanced load
- As long as sufficient capacity exists and routing maintains a balanced load, the service can absorb an unlimited number of connections
Handling high-scale fanout
The main challenge for connection scaling is high-scale fanout — when a large number of connections are subscribed to common channels. Ably addresses this through a tiered fanout architecture.Two-tier fanout
When a message is published to a channel with many subscribers:First tier: Channel to frontends
The channel processor forwards the message to all frontend servers that have subscribers for that channel.
Regional tier
At the regional tier, a channel also disseminates processed messages to corresponding channels in other regions where the channel is active. This ensures global distribution with optimized routing.Subscription mapping
The channel processor maintains a map of which frontend servers have subscribers for each channel:- When a new subscription is created, the frontend server notifies the channel processor
- The channel processor updates its subscription map
- Messages are only sent to frontend servers that have active subscribers
- This optimizes network usage and processing resources
Message throughput scalability
Ably achieves scalability through multiple complementary approaches:- Distributed processing: Messages are processed by the core instance responsible for the channel, distributing the load across the cluster
- Efficient routing: The system routes messages directly to interested parties without unnecessary network hops
- Optimized protocols: Binary protocols and efficient message encoding minimize overhead
- Connection optimizations: Features like delta compression reduce bandwidth requirements for large messages
Monitoring and auto-scaling
Maintaining effective horizontal scalability requires continuous monitoring and automated scaling.Monitoring metrics
Ably’s platform monitors various metrics:- CPU and memory utilization
- Message rates
- Channel and connection counts
- Resource headroom
Automated scaling triggers
When monitoring determines that the load is approaching the current capacity:- Automatic scaling triggers to add more resources
- For stateful roles, progressive hashing introduces new capacity gradually
- This minimizes disruption to the existing workload
Scaling down
When the system detects excess capacity:- It can scale down by gradually removing resources
- This optimizes cost efficiency without impacting performance
Regional variations
The auto-scaling systems also account for regional variations in load:- Different regions may experience peak loads at different times due to time zone differences and regional events
- By scaling each region independently based on its current load, Ably ensures efficient resource utilization across the global platform
Load testing
Regular load testing helps validate the system’s scalability properties:- Ensures that the distribution mechanisms work as expected at scale
- Identifies potential bottlenecks before they affect real traffic
- Tests how well the system redistributes work after failures
- Measures how quickly the system can scale up and down
Practical limits and considerations
While Ably’s architecture is designed for horizontal scalability, practical considerations do exist that developers should understand when architecting applications on the platform.Channel considerations
When working with channels, several factors should be considered:- While there’s no hard limit on the number of channels, each active channel consumes memory and CPU resources
- Very high message rates on a single channel may encounter throughput limitations, as all processing for one channel occurs on a single core instance
- Applications should distribute high-volume message traffic across multiple channels when possible
Connection considerations
Connection factors to keep in mind:- Each connection consumes memory and processing resources on frontend instances
- Very high message rates on a single connection may encounter throughput limitations due to network constraints and protocol overhead
- For publishing at sustained high rates, applications may need to distribute work across multiple connections or use the REST API
Message considerations
Message rate and size impact overall system performance:- Default message size limits (typically 64KB) protect against excessive memory pressure and network load
- Very large messages impact processing cost and transit latency, especially in high-volume scenarios
- Features like delta compression help manage bandwidth for large messages with minor changes
Benefits of Ably’s scalable architecture
Ably’s horizontally scalable architecture provides several key benefits that directly impact application development and user experience.No scale ceiling
The most fundamental benefit is the removal of technical limitations on growth:Unlimited channels
No limit on the number of channels your application can use.
Unlimited connections
No limit on the number of concurrent connections.
Unlimited throughput
No limit on the aggregate message throughput.
Seamless growth
Applications can scale from prototype to global adoption without fundamental architecture changes.
Automatic elasticity
The platform handles scaling automatically:- Resources are provisioned on demand as load changes
- Scaling occurs independently across different dimensions based on actual usage patterns
- Applications benefit from elasticity without any additional configuration or management
- No need for capacity planning and over-provisioning
Developer focus
Engineering teams can concentrate on building features that deliver business value:- No need to design complex scaling architectures
- No requirement to manage infrastructure
- No operational overhead of monitoring and scaling systems
- Accelerated time-to-market
- Teams can invest their time in innovation rather than operations
Cost efficiency
Elastic scaling provides cost benefits:- Pay only for the resources you use
- Automatic scaling down during periods of low demand
- No need to over-provision for peak capacity
Next steps
Performance
Learn how Ably maintains low latency at scale
Fault tolerance
Understand how Ably maintains reliability while scaling
Edge network
Explore Ably’s global edge network infrastructure
Limits
Review rate limits and quotas
