LLM Gateway Core implements distributed rate limiting using the token bucket algorithm with Redis Lua scripts. This ensures atomic operations and prevents race conditions in multi-instance deployments.
local state = redis.call('HMGET', key, 'tokens', 'last_refill')local tokens = tonumber(state[1]) or capacitylocal last_refill = tonumber(state[2]) or now
Retrieves the current token count and last refill timestamp. Defaults to full capacity for new keys.
Rate limiting is enforced via FastAPI dependency in app/api/v1/chat.py:
from fastapi import APIRouter, Depends, Request, HTTPExceptionfrom app.core.rate_limiter import RedisRateLimiterfrom app.core.metrics import RATE_LIMIT_ALLOWED, RATE_LIMIT_BLOCKEDfrom app.core.config import settings# Initialize the rate limiter (Redis-backed)rate_limiter = RedisRateLimiter( capacity=settings.RATE_LIMITER_CAPACITY, refill_rate=settings.RATE_LIMITER_REFILL_RATE)def get_client_key(request: Request) -> str: """Extracts a unique key for the client (API Key or IP).""" return request.headers.get("X-API-Key") or request.client.hostasync def rate_limit_dependency(request: Request): """ FastAPI dependency to enforce rate limiting and record metrics. Also validates the API key if provided. """ api_key = request.headers.get("X-API-Key") valid_keys = [k.strip() for k in settings.API_KEYS.split(",") if k.strip()] if api_key not in valid_keys: raise HTTPException( status_code=401, detail="Invalid or missing API Key" ) key = api_key or request.client.host if not rate_limiter.allow(key): RATE_LIMIT_BLOCKED.inc() raise HTTPException( status_code=429, detail="Too many requests. Please wait before trying again." ) RATE_LIMIT_ALLOWED.inc()@app.post("", response_model=ChatResponse, dependencies=[Depends(rate_limit_dependency)])async def chat(request: ChatRequest): """ Entry point for all chat completions. Processes the chat request and returns a chat response. """ return await chat_service.chat(request)
def get_client_key(request: Request) -> str: """Extracts a unique key for the client (API Key or IP).""" return request.headers.get("X-API-Key") or request.client.host
Priority:
API Key - If provided, rate limit per API key
IP Address - Fallback to IP-based limiting
This allows:
Different rate limits for different API key tiers
IP-based protection against unauthenticated abuse
Flexible quota management
The key is prefixed with ratelimit: in Redis to namespace it and avoid collisions with cache keys.
except redis.exceptions.NoScriptError: # Reload script if it was flushed from Redis self._script_hash = self.client.script_load(self._lua_script) return self.allow(key)
Automatically reloads the Lua script if Redis was restarted or flushed.