Overview
Scalekit implements rate limiting to ensure fair usage and maintain API availability for all customers. Rate limits protect the API infrastructure from abuse and ensure consistent performance.Rate Limit Policy
Rate limits are applied per environment and are calculated based on:- API endpoint: Different endpoints may have different rate limits
- Authentication credentials: Limits are tracked per client_id
- Time window: Limits reset after a specific time period
Rate limits may vary based on your Scalekit plan. Contact support for enterprise rate limit requirements.
Rate Limit Headers
API responses include rate limit information in the response headers:Maximum number of requests allowed in the current time window
Number of requests remaining in the current time window
Unix timestamp when the rate limit window resets
Example Response Headers
Rate Limit Exceeded
When you exceed the rate limit, the API returns a429 Too Many Requests error:
Number of seconds to wait before retrying the request
Best Practices
Monitor Rate Limit Headers
Always check rate limit headers in API responses to avoid hitting limits:Implement Exponential Backoff
When you encounter rate limits, use exponential backoff to retry requests:Batch Requests
When possible, use batch operations or pagination to reduce the number of API calls:Cache Responses
Cache API responses when appropriate to reduce redundant requests:Rate Limit Tiers
Different API endpoints may have different rate limits based on their resource intensity:| Endpoint Category | Typical Limit | Notes |
|---|---|---|
| Read operations | Higher limits | GET requests for listing and retrieving resources |
| Write operations | Moderate limits | POST, PUT, PATCH requests for creating/updating resources |
| Delete operations | Lower limits | DELETE requests for removing resources |
| Authentication | Special limits | Token generation and validation endpoints |
Contact Scalekit support if you need higher rate limits for your use case.
Handling Rate Limits in Production
Queue-Based Architecture
For high-volume applications, implement a queue-based system:- Add API requests to a queue
- Process requests at a controlled rate
- Monitor rate limit headers
- Adjust processing speed based on remaining quota
Distributed Rate Limiting
If your application runs on multiple servers, coordinate rate limiting across instances:- Use a shared cache (Redis, Memcached) to track API usage
- Distribute quota across application instances
- Implement circuit breakers to prevent cascading failures
Monitoring and Alerts
Set up monitoring to track rate limit usage:- Log rate limit headers from API responses
- Alert when remaining quota drops below a threshold
- Track 429 errors in your application logs
- Monitor retry patterns and backoff behavior
Next Steps
API Overview
Learn about API structure and versioning
Authentication
Set up API authentication