Rate limiting

The Screen Answerer API implements multiple layers of rate limiting to prevent abuse, manage server resources, and avoid exhausting your Google Gemini API quota.

Rate limit layers

The API enforces three distinct rate limiting mechanisms:

1. Global IP-based rate limit

Limit: 100 requests per 15 minutes per IP address Scope: All endpoints Implementation: Express rate limiter middleware (server.js:47-50)

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100 // limit each IP to 100 requests per windowMs
});

When exceeded:

{
  "error": "Too many requests from this IP, please try again later"
}

2. Per-client request throttling

Limit: 5-second minimum interval between requests Scope: /monitor_screen endpoint (server.js:91-102) Purpose: Prevents rapid-fire requests that waste API quota on duplicate screens

const RATE_LIMIT_WINDOW = 5000; // 5 seconds

function isRateLimited(clientId) {
  const now = Date.now();
  const lastCallTime = apiCallTimestamps.get(clientId) || 0;
  
  if (now - lastCallTime < RATE_LIMIT_WINDOW) {
    return true; // Rate limited
  }
  
  apiCallTimestamps.set(clientId, now);
  return false;
}

When exceeded:

{
  "error": "Rate limit exceeded",
  "message": "Please wait before sending another request"
}

3. Internal API quota management

Limit: 50 API calls per minute to Gemini Scope: All AI processing Reset: Every 60 seconds (server.js:110-112)

let apiCallCounter = 0;
const API_CALL_QUOTA_LIMIT = 50;
const API_CALL_RESET_INTERVAL = 60 * 1000; // Reset every minute

setInterval(() => {
  apiCallCounter = 0;
}, API_CALL_RESET_INTERVAL);

When approaching limit:

{
  "error": "Failed to process question",
  "message": "API quota limit approaching, please try again later"
}

Rate limit headers

The current implementation does not expose rate limit information in response headers. Track your request timing client-side to avoid hitting limits.

Handling rate limits

Client-side throttling

Implement request throttling to stay within limits:

JavaScript Throttle Function

class RateLimitedClient {
  constructor(apiKey, minInterval = 5000) {
    this.apiKey = apiKey;
    this.minInterval = minInterval;
    this.lastRequest = 0;
  }
  
  async processQuestion(question) {
    // Enforce minimum interval
    const now = Date.now();
    const timeSinceLastRequest = now - this.lastRequest;
    
    if (timeSinceLastRequest < this.minInterval) {
      const waitTime = this.minInterval - timeSinceLastRequest;
      await new Promise(resolve => setTimeout(resolve, waitTime));
    }
    
    this.lastRequest = Date.now();
    
    const response = await fetch('http://localhost:3000/process_question', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-API-Key': this.apiKey
      },
      body: JSON.stringify({ question })
    });
    
    return await response.json();
  }
}

// Usage
const client = new RateLimitedClient('YOUR_API_KEY', 5000);
await client.processQuestion('What is 2+2?');

Retry with exponential backoff

Handle 429 errors with automatic retries:

JavaScript Retry Logic

async function processWithRetry(question, maxRetries = 3) {
  let retries = 0;
  let delay = 1000; // Start with 1 second
  
  while (retries < maxRetries) {
    try {
      const response = await fetch('http://localhost:3000/process_question', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'X-API-Key': apiKey
        },
        body: JSON.stringify({ question })
      });
      
      if (response.status === 429) {
        // Rate limited - retry with backoff
        retries++;
        if (retries >= maxRetries) {
          throw new Error('Rate limit exceeded - max retries reached');
        }
        
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
        delay *= 2; // Exponential backoff
        continue;
      }
      
      return await response.json();
    } catch (error) {
      if (retries >= maxRetries) throw error;
      retries++;
      await new Promise(resolve => setTimeout(resolve, delay));
      delay *= 2;
    }
  }
}

Python Retry with Backoff

import time
import requests
from typing import Dict, Any

def process_with_retry(
    question: str,
    api_key: str,
    max_retries: int = 3
) -> Dict[str, Any]:
    retries = 0
    delay = 1  # Start with 1 second
    
    while retries < max_retries:
        try:
            response = requests.post(
                'http://localhost:3000/process_question',
                headers={'X-API-Key': api_key},
                json={'question': question}
            )
            
            if response.status_code == 429:
                retries += 1
                if retries >= max_retries:
                    raise Exception('Rate limit exceeded - max retries reached')
                
                print(f'Rate limited. Retrying in {delay}s...')
                time.sleep(delay)
                delay *= 2  # Exponential backoff
                continue
            
            response.raise_for_status()
            return response.json()
            
        except requests.exceptions.RequestException as e:
            retries += 1
            if retries >= max_retries:
                raise e
            time.sleep(delay)
            delay *= 2
    
    raise Exception('Max retries exceeded')

Request queuing

For high-volume applications, implement a queue:

JavaScript Queue System

class RequestQueue {
  constructor(apiKey, requestsPerMinute = 50) {
    this.apiKey = apiKey;
    this.queue = [];
    this.processing = false;
    this.interval = 60000 / requestsPerMinute; // ms per request
  }
  
  async enqueue(question) {
    return new Promise((resolve, reject) => {
      this.queue.push({ question, resolve, reject });
      this.processQueue();
    });
  }
  
  async processQueue() {
    if (this.processing || this.queue.length === 0) return;
    
    this.processing = true;
    
    while (this.queue.length > 0) {
      const { question, resolve, reject } = this.queue.shift();
      
      try {
        const response = await fetch('http://localhost:3000/process_question', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'X-API-Key': this.apiKey
          },
          body: JSON.stringify({ question })
        });
        
        const data = await response.json();
        resolve(data);
      } catch (error) {
        reject(error);
      }
      
      // Wait before next request
      if (this.queue.length > 0) {
        await new Promise(resolve => setTimeout(resolve, this.interval));
      }
    }
    
    this.processing = false;
  }
}

// Usage
const queue = new RequestQueue('YOUR_API_KEY', 45); // 45 req/min for safety
await queue.enqueue('What is 2+2?');

Server-side retry logic

The API includes automatic retry logic for Gemini API calls (server.js:135-170):

const MAX_RETRIES = 3;
const INITIAL_RETRY_DELAY = 1000; // 1 second

async function callGeminiAPI(apiCallFn, maxRetries = MAX_RETRIES) {
  let retries = 0;
  let delay = INITIAL_RETRY_DELAY;
  
  while (true) {
    try {
      // Check quota
      if (isApproachingQuotaLimit()) {
        throw new Error('API quota limit approaching');
      }
      
      incrementApiCallCounter();
      return await apiCallFn();
    } catch (error) {
      // Retry on rate limit/quota errors
      if (retries >= maxRetries || 
          (!error.message.includes('429') && 
           !error.message.includes('quota') && 
           !error.message.includes('Resource has been exhausted'))) {
        throw error;
      }
      
      console.log(`Retrying in ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
      
      // Exponential backoff with jitter
      delay = Math.min(delay * 2, 10000) * (0.8 + Math.random() * 0.4);
      retries++;
    }
  }
}

The server automatically retries transient errors, so most rate limit issues are handled transparently.

Monitoring quota usage

Track your Google Gemini API usage:

Visit Google AI Studio
Navigate to your API key settings
Monitor request counts and quota limits

Set up alerts in Google Cloud Console to notify you when approaching quota limits.

Best practices

Use appropriate intervals

For screen monitoring, use 5-second intervals to match the rate limit window

Implement client-side throttling

Don’t rely solely on server-side rate limits - throttle requests in your client

Handle 429 gracefully

Always implement retry logic with exponential backoff for rate limit errors

Choose the right model

Use gemini-2.0-flash-lite to reduce API quota consumption

Testing rate limits

Test your rate limit handling:

Rapid Request Test

# Send 10 requests rapidly to trigger rate limiting
for i in {1..10}; do
  curl -X POST http://localhost:3000/process_question \
    -H "Content-Type: application/json" \
    -H "X-API-Key: YOUR_API_KEY" \
    -d '{"question": "Test?"}' &
done
wait

Expect to see some 429 responses after the 5-second window.

Error Handling - Complete error handling guide
Examples - Integration examples with rate limiting
Best Practices - Optimization strategies

Endpoints

Integration

Rate limit layers

1. Global IP-based rate limit

2. Per-client request throttling

3. Internal API quota management

Rate limit headers

Handling rate limits

Client-side throttling

Retry with exponential backoff

Request queuing

Server-side retry logic

Monitoring quota usage

Best practices

Use appropriate intervals

Implement client-side throttling

Handle 429 gracefully

Choose the right model

Testing rate limits

Build docs developers (and LLMs) love

Endpoints

Integration

​Rate limit layers

​1. Global IP-based rate limit

​2. Per-client request throttling

​3. Internal API quota management

​Rate limit headers

​Handling rate limits

​Client-side throttling

​Retry with exponential backoff

​Request queuing

​Server-side retry logic

​Monitoring quota usage

​Best practices

Use appropriate intervals

Implement client-side throttling

Handle 429 gracefully

Choose the right model

​Testing rate limits

​Related resources

Build docs developers (and LLMs) love

Rate limit layers

1. Global IP-based rate limit

2. Per-client request throttling

3. Internal API quota management

Rate limit headers

Handling rate limits

Client-side throttling

Retry with exponential backoff

Request queuing

Server-side retry logic

Monitoring quota usage

Best practices

Testing rate limits

Related resources