Stateless Architecture
Lemline’s stateless design enables seamless horizontal scaling:- No Sticky Sessions: Any runner can process any workflow message
- No Shared State: Workflow state is carried in messages, not in memory
- No Coordination: Runners operate independently without leader election
- Database-Backed Durability: Only durable operations touch the database
How It Works
Message-Driven Load Distribution
Load distribution happens naturally through your message broker:- Workflow messages arrive in broker topics/queues
- Broker distributes messages across consumer group members
- Each runner processes messages independently
- Runners write to shared database only when needed (waits, retries, checkpoints)
Dual-Channel Processing
Lemline uses two channels for optimal performance:- Commands Channel (
commands-in/out): High-throughput stateless execution - Events Channel (
events-out): Durable workflow operations requiring DB guarantees
Scaling Strategies
Manual Scaling
Add more runners by starting additional instances with identical configuration: Docker:Kubernetes Horizontal Pod Autoscaler
Automatically scale based on metrics:Custom Metrics Autoscaling
Scale based on Kafka consumer lag or queue depth:Requires metrics adapter like Prometheus Adapter or Keda.
Kafka-Specific Scaling
Consumer Group Balancing
Kafka distributes partitions across consumers in a group:- Partitions: Number of topic partitions determines max parallelism
- Consumers: More consumers than partitions results in idle consumers
- Rebalancing: Adding/removing consumers triggers partition rebalancing
Consumer Lag Monitoring
Monitor consumer lag to determine when to scale:- Lag < 100: Optimal (current capacity sufficient)
- Lag 100-1000: Consider scaling up
- Lag > 1000: Scale up immediately
- Lag = 0: Consider scaling down
RabbitMQ-Specific Scaling
Queue Prefetch Configuration
Control how many messages each runner prefetches:- Lower prefetch (1-5): Better load distribution, more overhead
- Higher prefetch (10-50): Better throughput, less even distribution
- Start with 10 and adjust based on message processing time
Queue Depth Monitoring
Monitor queue depth to trigger scaling:- Messages/Consumer < 10: Optimal
- Messages/Consumer 10-100: Consider scaling up
- Messages/Consumer > 100: Scale up immediately
PGMQ-Specific Scaling
Polling Configuration
PGMQ uses PostgreSQL for messaging with polling-based consumption:- More runners increase database load (polling queries)
- Use connection pooling to manage database connections
- Monitor PostgreSQL CPU and connection count
Database Connection Pooling
max_connections setting accommodates total connections.
Scaling Limits
Message Broker Limits
| Broker | Scaling Factor | Notes |
|---|---|---|
| Kafka | Up to partitions | Max runners = topic partitions |
| RabbitMQ | Unlimited | Limited by queue throughput |
| PGMQ | Database-limited | Limited by PostgreSQL connections and CPU |
Database Limits
- Connection Pool Exhaustion: Monitor active connections
- Lock Contention: Outbox queries use
FOR UPDATE SKIP LOCKED - I/O Throughput: High workflow count may saturate database I/O
Practical Limits
Small-to-medium workflows:- 10-50 runners: Typical for most deployments
- 50-100 runners: High-throughput production systems
- 100+ runners: Enterprise-scale deployments
- Serialization/deserialization overhead increases with workflow size
- Consider breaking large workflows into smaller sub-workflows
Performance Tuning
JVM Tuning
Optimize JVM settings for scaled deployments:Native Image Tuning
Native images have lower memory overhead:Database Query Optimization
Outbox Queries (FOR UPDATE SKIP LOCKED):execute_at(waits, retries)workflow_instance_id(all tables)status(where applicable)
Monitoring and Metrics
Key Metrics to Monitor
Throughput:- Workflows started per second
- Tasks executed per second
- Messages processed per second
- Workflow execution time (p50, p95, p99)
- Task execution time
- Database query latency
- CPU utilization per runner
- Memory usage per runner
- Database connections
- Message broker consumer lag
- Failed workflows
- Retried tasks
- Infrastructure errors
Prometheus Metrics
Lemline exposes Micrometer metrics compatible with Prometheus:Alerting Rules
Testing Scalability
Load Testing
Stress Testing
Test beyond expected capacity:- Burst traffic: Sudden spike in workflow submissions
- Sustained load: Continuous high throughput for hours
- Failure scenarios: Kill runners mid-processing to verify resilience
Best Practices
Start with 3+ runners
Start with 3+ runners
Always run at least 3 runners for high availability, even in development.
Monitor consumer lag
Monitor consumer lag
Consumer lag is the best indicator of when to scale. Set up alerts at lag > 1000.
Scale gradually
Scale gradually
Double runners at a time (3 → 6 → 12) rather than large jumps. Monitor impact before further scaling.
Use autoscaling in production
Use autoscaling in production
Kubernetes HPA or cloud autoscaling groups handle traffic variations automatically.
Provision adequate database resources
Provision adequate database resources
As you scale runners, ensure database CPU, memory, and connections scale proportionally.
Partition Kafka topics appropriately
Partition Kafka topics appropriately
Create enough partitions to support your max expected runner count (e.g., 24-48 partitions).
Test failure scenarios
Test failure scenarios
Regularly test runner failures during load to verify graceful degradation.
Troubleshooting Scaling Issues
Throughput not increasing with runners
Possible causes:- Database bottleneck (CPU, I/O, connections)
- Message broker partition limit (Kafka)
- Network bandwidth saturation
- Scale database vertically or use read replicas
- Increase Kafka topic partitions
- Optimize workflow definitions (reduce state size)
High CPU usage on runners
Possible causes:- Complex workflows with large state serialization
- Inefficient task processors
- JVM garbage collection overhead
- Use native images for lower CPU overhead
- Profile with JVM profilers (JFR, async-profiler)
- Optimize workflow definitions
Database connection exhaustion
Symptoms:max_connections:
Next Steps
Kubernetes Deployment
Deploy with autoscaling on Kubernetes
Monitoring
Set up monitoring and alerting
Configuration
Optimize configuration for scale
Performance Tuning
Advanced performance optimization