Getting Started with Gemini
This guide walks you through the fundamentals of using Gemini models for text generation on Google Cloud.
Prerequisites
Google Cloud Project
You need an active Google Cloud project with billing enabled. Create a project if you don’t have one.
Authentication
Set up authentication for your environment using Application Default Credentials or a service account.
Installation
Install the Google Gen AI SDK for Python:
pip install --upgrade google-genai
Basic Setup
Import Libraries
import os
from google import genai
from google.genai.types import GenerateContentConfig
from IPython.display import Markdown, display
Authenticate (Colab Only)
If you’re using Google Colab, authenticate your session:
import sys
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
Initialize the Client
PROJECT_ID = "your-project-id"
LOCATION = "global" # or "us-central1", "europe-west1", etc.
client = genai.Client(
vertexai = True ,
project = PROJECT_ID ,
location = LOCATION
)
Your First Request
Send a simple text generation request:
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Explain quantum computing in simple terms."
)
print (response.text)
The response.text property returns the generated text. For more details about the response structure, access response.candidates[0].
Choosing a Model
Select the appropriate model for your use case:
Gemini 3.1 Pro
Gemini 3 Flash
Gemini 2.5 Pro
MODEL_ID = "gemini-3.1-pro-preview"
Best for:
Complex reasoning tasks
Advanced code generation
Agentic workflows
Maximum quality
MODEL_ID = "gemini-3-flash-preview"
Best for:
Fast responses
High-volume applications
Real-time interactions
Cost optimization
MODEL_ID = "gemini-2.5-pro"
Best for:
Production workloads
Balanced performance
General-purpose tasks
Configuration Parameters
Temperature
Controls randomness in the output (0.0 to 2.0):
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Write a creative story about a robot." ,
config = GenerateContentConfig(
temperature = 1.5 # Higher = more creative
)
)
For Gemini 3 models, we strongly recommend keeping temperature=1.0 as the reasoning capabilities are optimized for this default value.
Top-P (Nucleus Sampling)
Controls diversity by considering only the top probability mass:
config = GenerateContentConfig(
temperature = 1.0 ,
top_p = 0.95 # Consider top 95% probability tokens
)
Max Output Tokens
Limit the length of generated responses:
config = GenerateContentConfig(
max_output_tokens = 2048 # Maximum tokens in response
)
Complete Example
from google.genai.types import GenerateContentConfig, ThinkingConfig, ThinkingLevel
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Explain the theory of relativity." ,
config = GenerateContentConfig(
temperature = 1.0 ,
top_p = 0.95 ,
max_output_tokens = 8000 ,
thinking_config = ThinkingConfig(
thinking_level = ThinkingLevel. MEDIUM
)
)
)
print (response.text)
Thinking Levels
Gemini 3.1 Pro supports configurable reasoning depth:
from google.genai.types import ThinkingConfig, ThinkingLevel
config = GenerateContentConfig(
thinking_config = ThinkingConfig(
thinking_level = ThinkingLevel. LOW
)
)
Token budget: 1-1,000 tokens
Best for: Simple queries, chat, fast responses
Latency: Minimal
config = GenerateContentConfig(
thinking_config = ThinkingConfig(
thinking_level = ThinkingLevel. MEDIUM
)
)
Token budget: 1,001-16,384 tokens
Best for: Moderate complexity, balanced performance
Latency: Medium
config = GenerateContentConfig(
thinking_config = ThinkingConfig(
thinking_level = ThinkingLevel. HIGH
)
)
Token budget: 16,384-32,768 tokens (default)
Best for: Complex reasoning, coding challenges
Latency: Higher, but more thorough
Streaming Responses
Stream responses token-by-token for better user experience:
for chunk in client.models.generate_content_stream(
model = "gemini-3.1-pro-preview" ,
contents = "Write a short story about space exploration." ,
config = GenerateContentConfig(
thinking_config = ThinkingConfig(
thinking_level = ThinkingLevel. LOW
)
)
):
print (chunk.text, end = "" )
System Instructions
Provide persistent instructions that apply to all requests:
system_instruction = """
You are a helpful AI assistant specializing in Python programming.
Always provide code examples with comments.
Explain complex concepts in simple terms.
"""
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "How do I read a CSV file?" ,
config = GenerateContentConfig(
system_instruction = system_instruction
)
)
Multi-Turn Conversations
Create chat sessions to maintain conversation context:
chat = client.chats.create(
model = "gemini-3.1-pro-preview" ,
config = GenerateContentConfig(
temperature = 1.0
)
)
# First message
response = chat.send_message( "What is a binary search tree?" )
print ( "Assistant:" , response.text)
# Follow-up message (maintains context)
response = chat.send_message( "Can you show me an implementation?" )
print ( "Assistant:" , response.text)
# View conversation history
for message in chat.history:
print ( f " { message.role } : { message.parts[ 0 ].text[: 100 ] } ..." )
Error Handling
Handle common errors gracefully:
from google.api_core import exceptions
try :
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Your prompt here"
)
print (response.text)
except exceptions.InvalidArgument as e:
print ( f "Invalid request: { e } " )
except exceptions.ResourceExhausted as e:
print ( f "Quota exceeded: { e } " )
except exceptions.DeadlineExceeded as e:
print ( f "Request timeout: { e } " )
except Exception as e:
print ( f "Unexpected error: { e } " )
Response Structure
Understand the response object:
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Hello!"
)
# Access the text
print (response.text)
# Access detailed candidate information
candidate = response.candidates[ 0 ]
print ( f "Finish reason: { candidate.finish_reason } " )
print ( f "Safety ratings: { candidate.safety_ratings } " )
# Access usage metadata
print ( f "Input tokens: { response.usage_metadata.prompt_token_count } " )
print ( f "Output tokens: { response.usage_metadata.candidates_token_count } " )
print ( f "Total tokens: { response.usage_metadata.total_token_count } " )
Safety Settings
Configure content filtering:
from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold
safety_settings = [
SafetySetting(
category = HarmCategory. HARM_CATEGORY_HARASSMENT ,
threshold = HarmBlockThreshold. BLOCK_MEDIUM_AND_ABOVE
),
SafetySetting(
category = HarmCategory. HARM_CATEGORY_HATE_SPEECH ,
threshold = HarmBlockThreshold. BLOCK_MEDIUM_AND_ABOVE
)
]
response = client.models.generate_content(
model = "gemini-3.1-pro-preview" ,
contents = "Your content here" ,
config = GenerateContentConfig(
safety_settings = safety_settings
)
)
# Check if response was blocked
if response.candidates[ 0 ].finish_reason == "SAFETY" :
print ( "Response was blocked due to safety filters" )
for rating in response.candidates[ 0 ].safety_ratings:
print ( f " { rating.category } : { rating.probability } " )
Best Practices
Use Clear Prompts Be specific and provide context for better results
Handle Errors Implement proper error handling for production apps
Monitor Usage Track token consumption to manage costs
Set Limits Use max_output_tokens to control response length
Next Steps
Multimodal Learn to process images, video, and audio
Function Calling Connect Gemini to external tools and APIs
Grounding Ground responses in real-time data
Context Caching Optimize costs with context caching