Getting Started with Gemini

This guide walks you through the fundamentals of using Gemini models for text generation on Google Cloud.

Prerequisites

Google Cloud Project

You need an active Google Cloud project with billing enabled. Create a project if you don’t have one.

Enable APIs

Enable the Vertex AI API in your project:Enable Vertex AI API

Authentication

Set up authentication for your environment using Application Default Credentials or a service account.

Installation

Install the Google Gen AI SDK for Python:

pip install --upgrade google-genai

Basic Setup

Import Libraries

import os
from google import genai
from google.genai.types import GenerateContentConfig
from IPython.display import Markdown, display

Authenticate (Colab Only)

If you’re using Google Colab, authenticate your session:

import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

Initialize the Client

PROJECT_ID = "your-project-id"
LOCATION = "global"  # or "us-central1", "europe-west1", etc.

client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location=LOCATION
)

Your First Request

Send a simple text generation request:

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Explain quantum computing in simple terms."
)

print(response.text)

The response.text property returns the generated text. For more details about the response structure, access response.candidates[0].

Choosing a Model

Select the appropriate model for your use case:

Gemini 3.1 Pro
Gemini 3 Flash
Gemini 2.5 Pro

MODEL_ID = "gemini-3.1-pro-preview"

Best for:

Complex reasoning tasks
Advanced code generation
Agentic workflows
Maximum quality

MODEL_ID = "gemini-3-flash-preview"

Best for:

Fast responses
High-volume applications
Real-time interactions
Cost optimization

MODEL_ID = "gemini-2.5-pro"

Best for:

Production workloads
Balanced performance
General-purpose tasks

Configuration Parameters

Temperature

Controls randomness in the output (0.0 to 2.0):

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Write a creative story about a robot.",
    config=GenerateContentConfig(
        temperature=1.5  # Higher = more creative
    )
)

For Gemini 3 models, we strongly recommend keeping temperature=1.0 as the reasoning capabilities are optimized for this default value.

Top-P (Nucleus Sampling)

Controls diversity by considering only the top probability mass:

config = GenerateContentConfig(
    temperature=1.0,
    top_p=0.95  # Consider top 95% probability tokens
)

Max Output Tokens

Limit the length of generated responses:

config = GenerateContentConfig(
    max_output_tokens=2048  # Maximum tokens in response
)

Complete Example

from google.genai.types import GenerateContentConfig, ThinkingConfig, ThinkingLevel

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Explain the theory of relativity.",
    config=GenerateContentConfig(
        temperature=1.0,
        top_p=0.95,
        max_output_tokens=8000,
        thinking_config=ThinkingConfig(
            thinking_level=ThinkingLevel.MEDIUM
        )
    )
)

print(response.text)

Thinking Levels

Gemini 3.1 Pro supports configurable reasoning depth:

Low
Medium
High

from google.genai.types import ThinkingConfig, ThinkingLevel

config = GenerateContentConfig(
    thinking_config=ThinkingConfig(
        thinking_level=ThinkingLevel.LOW
    )
)

Token budget: 1-1,000 tokens
Best for: Simple queries, chat, fast responses
Latency: Minimal

config = GenerateContentConfig(
    thinking_config=ThinkingConfig(
        thinking_level=ThinkingLevel.MEDIUM
    )
)

Token budget: 1,001-16,384 tokens
Best for: Moderate complexity, balanced performance
Latency: Medium

config = GenerateContentConfig(
    thinking_config=ThinkingConfig(
        thinking_level=ThinkingLevel.HIGH
    )
)

Token budget: 16,384-32,768 tokens (default)
Best for: Complex reasoning, coding challenges
Latency: Higher, but more thorough

Streaming Responses

Stream responses token-by-token for better user experience:

for chunk in client.models.generate_content_stream(
    model="gemini-3.1-pro-preview",
    contents="Write a short story about space exploration.",
    config=GenerateContentConfig(
        thinking_config=ThinkingConfig(
            thinking_level=ThinkingLevel.LOW
        )
    )
):
    print(chunk.text, end="")

System Instructions

Provide persistent instructions that apply to all requests:

system_instruction = """
You are a helpful AI assistant specializing in Python programming.
Always provide code examples with comments.
Explain complex concepts in simple terms.
"""

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="How do I read a CSV file?",
    config=GenerateContentConfig(
        system_instruction=system_instruction
    )
)

Multi-Turn Conversations

Create chat sessions to maintain conversation context:

chat = client.chats.create(
    model="gemini-3.1-pro-preview",
    config=GenerateContentConfig(
        temperature=1.0
    )
)

# First message
response = chat.send_message("What is a binary search tree?")
print("Assistant:", response.text)

# Follow-up message (maintains context)
response = chat.send_message("Can you show me an implementation?")
print("Assistant:", response.text)

# View conversation history
for message in chat.history:
    print(f"{message.role}: {message.parts[0].text[:100]}...")

Error Handling

Handle common errors gracefully:

from google.api_core import exceptions

try:
    response = client.models.generate_content(
        model="gemini-3.1-pro-preview",
        contents="Your prompt here"
    )
    print(response.text)
    
except exceptions.InvalidArgument as e:
    print(f"Invalid request: {e}")
    
except exceptions.ResourceExhausted as e:
    print(f"Quota exceeded: {e}")
    
except exceptions.DeadlineExceeded as e:
    print(f"Request timeout: {e}")
    
except Exception as e:
    print(f"Unexpected error: {e}")

Response Structure

Understand the response object:

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Hello!"
)

# Access the text
print(response.text)

# Access detailed candidate information
candidate = response.candidates[0]
print(f"Finish reason: {candidate.finish_reason}")
print(f"Safety ratings: {candidate.safety_ratings}")

# Access usage metadata
print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
print(f"Total tokens: {response.usage_metadata.total_token_count}")

Safety Settings

Configure content filtering:

from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold

safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
    )
]

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Your content here",
    config=GenerateContentConfig(
        safety_settings=safety_settings
    )
)

# Check if response was blocked
if response.candidates[0].finish_reason == "SAFETY":
    print("Response was blocked due to safety filters")
    for rating in response.candidates[0].safety_ratings:
        print(f"{rating.category}: {rating.probability}")

Best Practices

Use Clear Prompts

Be specific and provide context for better results

Handle Errors

Implement proper error handling for production apps

Monitor Usage

Track token consumption to manage costs

Set Limits

Use max_output_tokens to control response length

Next Steps

Multimodal

Learn to process images, video, and audio

Function Calling

Connect Gemini to external tools and APIs

Grounding

Ground responses in real-time data

Context Caching

Optimize costs with context caching

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

Getting Started with Gemini

Getting Started with Gemini

Prerequisites

Installation

Basic Setup

Import Libraries

Authenticate (Colab Only)

Initialize the Client

Your First Request

Choosing a Model

Configuration Parameters

Temperature

Top-P (Nucleus Sampling)

Max Output Tokens

Complete Example

Thinking Levels

Streaming Responses

System Instructions

Multi-Turn Conversations

Error Handling

Response Structure

Safety Settings

Best Practices

Use Clear Prompts

Handle Errors

Monitor Usage

Set Limits

Next Steps

Multimodal

Function Calling

Grounding

Context Caching

Build docs developers (and LLMs) love

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

​Getting Started with Gemini

​Prerequisites

​Installation

​Basic Setup

​Import Libraries

​Authenticate (Colab Only)

​Initialize the Client

​Your First Request

​Choosing a Model

​Configuration Parameters

​Temperature

​Top-P (Nucleus Sampling)

​Max Output Tokens

​Complete Example

​Thinking Levels

​Streaming Responses

​System Instructions

​Multi-Turn Conversations

​Error Handling

​Response Structure

​Safety Settings

​Best Practices

Use Clear Prompts

Handle Errors

Monitor Usage

Set Limits

​Next Steps

Multimodal

Function Calling

Grounding

Context Caching

Build docs developers (and LLMs) love

Getting Started with Gemini

Prerequisites

Installation

Basic Setup

Import Libraries

Authenticate (Colab Only)

Initialize the Client

Your First Request

Choosing a Model

Configuration Parameters

Temperature

Top-P (Nucleus Sampling)

Max Output Tokens

Complete Example

Thinking Levels

Streaming Responses

System Instructions

Multi-Turn Conversations

Error Handling

Response Structure

Safety Settings

Best Practices

Next Steps