The Screen Answerer API implements multiple layers of rate limiting to prevent abuse, manage server resources, and avoid exhausting your Google Gemini API quota.
Rate limit layers
The API enforces three distinct rate limiting mechanisms:
1. Global IP-based rate limit
Limit : 100 requests per 15 minutes per IP address
Scope : All endpoints
Implementation : Express rate limiter middleware (server.js:47-50)
const limiter = rateLimit ({
windowMs: 15 * 60 * 1000 , // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});
When exceeded:
{
"error" : "Too many requests from this IP, please try again later"
}
2. Per-client request throttling
Limit : 5-second minimum interval between requests
Scope : /monitor_screen endpoint (server.js:91-102)
Purpose : Prevents rapid-fire requests that waste API quota on duplicate screens
const RATE_LIMIT_WINDOW = 5000 ; // 5 seconds
function isRateLimited ( clientId ) {
const now = Date . now ();
const lastCallTime = apiCallTimestamps . get ( clientId ) || 0 ;
if ( now - lastCallTime < RATE_LIMIT_WINDOW ) {
return true ; // Rate limited
}
apiCallTimestamps . set ( clientId , now );
return false ;
}
When exceeded:
{
"error" : "Rate limit exceeded" ,
"message" : "Please wait before sending another request"
}
3. Internal API quota management
Limit : 50 API calls per minute to Gemini
Scope : All AI processing
Reset : Every 60 seconds (server.js:110-112)
let apiCallCounter = 0 ;
const API_CALL_QUOTA_LIMIT = 50 ;
const API_CALL_RESET_INTERVAL = 60 * 1000 ; // Reset every minute
setInterval (() => {
apiCallCounter = 0 ;
}, API_CALL_RESET_INTERVAL );
When approaching limit:
{
"error" : "Failed to process question" ,
"message" : "API quota limit approaching, please try again later"
}
The current implementation does not expose rate limit information in response headers. Track your request timing client-side to avoid hitting limits.
Handling rate limits
Client-side throttling
Implement request throttling to stay within limits:
JavaScript Throttle Function
class RateLimitedClient {
constructor ( apiKey , minInterval = 5000 ) {
this . apiKey = apiKey ;
this . minInterval = minInterval ;
this . lastRequest = 0 ;
}
async processQuestion ( question ) {
// Enforce minimum interval
const now = Date . now ();
const timeSinceLastRequest = now - this . lastRequest ;
if ( timeSinceLastRequest < this . minInterval ) {
const waitTime = this . minInterval - timeSinceLastRequest ;
await new Promise ( resolve => setTimeout ( resolve , waitTime ));
}
this . lastRequest = Date . now ();
const response = await fetch ( 'http://localhost:3000/process_question' , {
method: 'POST' ,
headers: {
'Content-Type' : 'application/json' ,
'X-API-Key' : this . apiKey
},
body: JSON . stringify ({ question })
});
return await response . json ();
}
}
// Usage
const client = new RateLimitedClient ( 'YOUR_API_KEY' , 5000 );
await client . processQuestion ( 'What is 2+2?' );
Retry with exponential backoff
Handle 429 errors with automatic retries:
async function processWithRetry ( question , maxRetries = 3 ) {
let retries = 0 ;
let delay = 1000 ; // Start with 1 second
while ( retries < maxRetries ) {
try {
const response = await fetch ( 'http://localhost:3000/process_question' , {
method: 'POST' ,
headers: {
'Content-Type' : 'application/json' ,
'X-API-Key' : apiKey
},
body: JSON . stringify ({ question })
});
if ( response . status === 429 ) {
// Rate limited - retry with backoff
retries ++ ;
if ( retries >= maxRetries ) {
throw new Error ( 'Rate limit exceeded - max retries reached' );
}
console . log ( `Rate limited. Retrying in ${ delay } ms...` );
await new Promise ( resolve => setTimeout ( resolve , delay ));
delay *= 2 ; // Exponential backoff
continue ;
}
return await response . json ();
} catch ( error ) {
if ( retries >= maxRetries ) throw error ;
retries ++ ;
await new Promise ( resolve => setTimeout ( resolve , delay ));
delay *= 2 ;
}
}
}
Python Retry with Backoff
import time
import requests
from typing import Dict, Any
def process_with_retry (
question : str ,
api_key : str ,
max_retries : int = 3
) -> Dict[ str , Any]:
retries = 0
delay = 1 # Start with 1 second
while retries < max_retries:
try :
response = requests.post(
'http://localhost:3000/process_question' ,
headers = { 'X-API-Key' : api_key},
json = { 'question' : question}
)
if response.status_code == 429 :
retries += 1
if retries >= max_retries:
raise Exception ( 'Rate limit exceeded - max retries reached' )
print ( f 'Rate limited. Retrying in { delay } s...' )
time.sleep(delay)
delay *= 2 # Exponential backoff
continue
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
retries += 1
if retries >= max_retries:
raise e
time.sleep(delay)
delay *= 2
raise Exception ( 'Max retries exceeded' )
Request queuing
For high-volume applications, implement a queue:
class RequestQueue {
constructor ( apiKey , requestsPerMinute = 50 ) {
this . apiKey = apiKey ;
this . queue = [];
this . processing = false ;
this . interval = 60000 / requestsPerMinute ; // ms per request
}
async enqueue ( question ) {
return new Promise (( resolve , reject ) => {
this . queue . push ({ question , resolve , reject });
this . processQueue ();
});
}
async processQueue () {
if ( this . processing || this . queue . length === 0 ) return ;
this . processing = true ;
while ( this . queue . length > 0 ) {
const { question , resolve , reject } = this . queue . shift ();
try {
const response = await fetch ( 'http://localhost:3000/process_question' , {
method: 'POST' ,
headers: {
'Content-Type' : 'application/json' ,
'X-API-Key' : this . apiKey
},
body: JSON . stringify ({ question })
});
const data = await response . json ();
resolve ( data );
} catch ( error ) {
reject ( error );
}
// Wait before next request
if ( this . queue . length > 0 ) {
await new Promise ( resolve => setTimeout ( resolve , this . interval ));
}
}
this . processing = false ;
}
}
// Usage
const queue = new RequestQueue ( 'YOUR_API_KEY' , 45 ); // 45 req/min for safety
await queue . enqueue ( 'What is 2+2?' );
Server-side retry logic
The API includes automatic retry logic for Gemini API calls (server.js:135-170):
const MAX_RETRIES = 3 ;
const INITIAL_RETRY_DELAY = 1000 ; // 1 second
async function callGeminiAPI ( apiCallFn , maxRetries = MAX_RETRIES ) {
let retries = 0 ;
let delay = INITIAL_RETRY_DELAY ;
while ( true ) {
try {
// Check quota
if ( isApproachingQuotaLimit ()) {
throw new Error ( 'API quota limit approaching' );
}
incrementApiCallCounter ();
return await apiCallFn ();
} catch ( error ) {
// Retry on rate limit/quota errors
if ( retries >= maxRetries ||
( ! error . message . includes ( '429' ) &&
! error . message . includes ( 'quota' ) &&
! error . message . includes ( 'Resource has been exhausted' ))) {
throw error ;
}
console . log ( `Retrying in ${ delay } ms...` );
await new Promise ( resolve => setTimeout ( resolve , delay ));
// Exponential backoff with jitter
delay = Math . min ( delay * 2 , 10000 ) * ( 0.8 + Math . random () * 0.4 );
retries ++ ;
}
}
}
The server automatically retries transient errors, so most rate limit issues are handled transparently.
Monitoring quota usage
Track your Google Gemini API usage:
Visit Google AI Studio
Navigate to your API key settings
Monitor request counts and quota limits
Set up alerts in Google Cloud Console to notify you when approaching quota limits.
Best practices
Use appropriate intervals For screen monitoring, use 5-second intervals to match the rate limit window
Implement client-side throttling Don’t rely solely on server-side rate limits - throttle requests in your client
Handle 429 gracefully Always implement retry logic with exponential backoff for rate limit errors
Choose the right model Use gemini-2.0-flash-lite to reduce API quota consumption
Testing rate limits
Test your rate limit handling:
# Send 10 requests rapidly to trigger rate limiting
for i in { 1..10} ; do
curl -X POST http://localhost:3000/process_question \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"question": "Test?"}' &
done
wait
Expect to see some 429 responses after the 5-second window.