Batch Prediction
Batch prediction allows you to send large numbers of multimodal requests to Gemini asynchronously. Instead of getting immediate responses, results are written to Cloud Storage or BigQuery when processing completes.Why Batch Prediction?
Cost Effective
50% lower cost compared to online predictions
High Volume
Process thousands of requests in a single job
No Rate Limits
Bypass per-minute quota restrictions
When to Use Batch Prediction
✅ Good Use Cases:- Processing large datasets (1000+ items)
- Offline analysis and evaluation
- Bulk content classification or summarization
- Dataset labeling and annotation
- Periodic batch jobs (nightly, weekly)
- Cost-sensitive workloads
- Real-time applications
- Interactive user experiences
- Low-latency requirements
- Small request volumes (less than 100 items)
Supported Models
Batch prediction is available for:- gemini-3.1-pro-preview
- gemini-3-flash-preview
- gemini-2.5-pro
- gemini-2.5-flash
- gemini-2.0-flash
Quick Start
Installation
Setup
Cloud Storage Workflow
Step 1: Prepare Input Data
Create a JSONL file with your requests: batch_requests.jsonl:Step 2: Upload to Cloud Storage
Step 3: Submit Batch Job
Step 4: Monitor Job Status
Step 5: Retrieve Results
Multimodal Batch Requests
Images
Videos
PDFs
BigQuery Workflow
Step 1: Create Input Table
Step 2: Submit Batch Job
Step 3: Query Results
Advanced Input Formatting
System Instructions
Safety Settings
Multiple Models
Mix different generation configs per request:List and Manage Jobs
List All Jobs
Get Job Details
Cancel a Job
Response Structure
Batch prediction output JSONL format:Error Handling
Request-Level Errors
Check status field in output:Job-Level Errors
Cost Optimization
Calculate Costs
Best Practices
Batch Size
Optimal batch size: 100-10,000 requests per file
File Location
Keep input files in us-central1 for best performance
Monitoring
Monitor job progress via console or API polling
Retries
Implement retry logic for failed individual requests
Input File Guidelines
- Format: JSONL (JSON Lines) with one request per line
- Size: Up to 10,000 requests per file
- Location: Must be in
us-central1region - Naming: Use regex patterns like
gs://bucket/*.jsonlfor multiple files - Permissions: Service account needs
storage.objects.getaccess
Output Considerations
- Results maintain input order
- Failed requests included with error status
- Output files written to timestamped subdirectories
- Use BigQuery for easier querying of large result sets
Processing Results at Scale
Parallel Processing
Export to Database
Next Steps
Context Caching
Cache repeated content in batch jobs
Multimodal
Process images and videos in batch
Function Calling
Use function calling in batch requests
Grounding
Ground batch predictions in data sources