Gemini for Data Analytics - Generative AI on Google Cloud

Overview

Gemini for Data Analytics (Conversational Analytics API) enables natural language interactions with your data. Ask questions in plain English and get SQL queries, visualizations, and insights from BigQuery and Looker data sources. Key capabilities:

Natural language to SQL: “Show me sales by region” → automatic SQL generation
Multi-datasource: Query BigQuery tables, Looker explores, and Looker Studio simultaneously
Automatic visualization: Generate charts and graphs from query results
Conversation memory: Follow-up questions maintain context
Clarification prompts: Agent asks for clarification when needed

Gemini for Data Analytics is currently in Pre-GA. Provide feedback to [email protected]

Architecture

Quick Start

Enable APIs

Required APIs for your Google Cloud project:

Install Client Library

pip install google-cloud-geminidataanalytics

Authenticate

from google.colab import auth
auth.authenticate_user()

Or for non-Colab environments:

gcloud auth application-default login

Create Your First Agent

from google.cloud import geminidataanalytics_v1beta as gda

client = gda.DataAgentServiceClient()

# Configure BigQuery datasource
bq_reference = gda.BigQueryTableReference(
    project_id="bigquery-public-data",
    dataset_id="faa",
    table_id="us_airports",
)

# Create data agent
agent = client.create_data_agent(
    parent=f"projects/{PROJECT_ID}/locations/global",
    data_agent_id="airports_agent",
    data_agent=gda.DataAgent(
        data_analytics_agent=gda.DataAnalyticsAgent(
            published_context=gda.Context(
                system_instruction="You are an aviation data analyst",
                datasource_references=gda.DatasourceReferences(
                    bq=gda.BigQueryTableReferences(
                        table_references=[bq_reference]
                    )
                ),
            )
        )
    ),
)

Ask Questions

chat_client = gda.DataChatServiceClient()

# Create conversation
conversation = chat_client.create_conversation(
    parent=f"projects/{PROJECT_ID}/locations/global",
    conversation=gda.Conversation(
        agents=[agent.name]
    ),
)

# Ask a question
response = chat_client.generate_message(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="How many airports are in California?"
        )
    )],
)

# Process response
for msg in response.messages:
    if msg.system_message.data:
        # SQL was generated and executed
        print("Generated SQL:", msg.system_message.data.generated_sql)
        print("Results:", msg.system_message.data.result.data)

Data Sources

BigQuery Tables

bq_reference = gda.BigQueryTableReference(
    project_id="bigquery-public-data",
    dataset_id="faa",
    table_id="us_airports",
)

datasource_refs = gda.DatasourceReferences(
    bq=gda.BigQueryTableReferences(
        table_references=[bq_reference]
    )
)

Looker Explores

# Configure Looker datasource
looker_ref = gda.LookerExploreReference(
    looker_instance_uri="https://my-company.looker.com",
    lookml_model="sales",
    explore="orders",
)

# Authentication with client credentials
credentials = gda.Credentials(
    oauth=gda.OAuthCredentials(
        secret=gda.OAuthCredentials.SecretBased(
            client_id="your-client-id",
            client_secret="your-client-secret",
        )
    )
)

datasource_refs = gda.DatasourceReferences(
    looker=gda.LookerExploreReferences(
        explore_references=[looker_ref]
    )
)

Looker Studio

studio_ref = gda.StudioDatasourceReference(
    datasource_id="your-studio-datasource-id"
)

datasource_refs = gda.DatasourceReferences(
    studio=gda.StudioDatasourceReferences(
        studio_references=[studio_ref]
    )
)

System Instructions

Provide context to improve answer quality:

Basic Instruction

system_instruction = "You are a sales analyst for an e-commerce company. Always include percentages and comparisons to previous periods."

context = gda.Context(
    system_instruction=system_instruction,
    datasource_references=datasource_refs,
)

Advanced YAML Template

For complex schemas, use YAML format:

system_instruction = """
- system_instruction: >
    You are an expert sales analyst for Acme Corp e-commerce store.
    
- tables:
    - table:
        - name: acme.sales.orders
        - description: Orders from the e-commerce store
        - synonyms: [sales, transactions]
        - tags: [sale, order, revenue]
        - fields:
            - field:
                - name: order_id
                - description: Unique identifier for each order
            - field:
                - name: customer_id
                - description: Unique customer identifier
            - field:
                - name: status
                - description: Order status
                - sample_values: [complete, shipped, returned, pending]
            - field:
                - name: total_amount
                - description: Total order value in USD
                - aggregations: [sum, avg, max, min]
            - field:
                - name: created_at
                - description: Order creation timestamp
        - measures:
            - measure:
                - name: revenue
                - description: Total revenue (sum of total_amount)
                - exp: SUM(total_amount)
        - golden_queries:
            - golden_query:
                - natural_language_query: What were total sales last month?
                - sql_query: >
                    SELECT SUM(total_amount) as revenue
                    FROM acme.sales.orders
                    WHERE created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)
                    AND status = 'complete'
                    
- glossaries:
    - glossary:
        - term: Revenue
        - description: Total value of completed orders
        - synonyms: [sales, income, earnings]
    - glossary:
        - term: Conversion Rate
        - description: Percentage of visitors who make a purchase
        - synonyms: [CVR, conversion]
        
- additional_descriptions:
    - text: All monetary values are in USD unless specified otherwise.
    - text: Fiscal year starts in April.
"""

Conversation Patterns

Stateful Conversations

Create conversations that maintain context:

chat_client = gda.DataChatServiceClient()

# Create conversation
conversation = chat_client.create_conversation(
    parent=f"projects/{PROJECT_ID}/locations/global",
    conversation_id="sales_analysis_1",
    conversation=gda.Conversation(
        agents=[agent.name]
    ),
)

# First question
response1 = chat_client.generate_message(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Show me sales by region for Q4 2024"
        )
    )],
)

# Follow-up question (context maintained)
response2 = chat_client.generate_message(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Now compare that to Q4 2023"  # "that" refers to previous query
        )
    )],
)

# Another follow-up
response3 = chat_client.generate_message(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Show it as a bar chart"  # "it" refers to the comparison
        )
    )],
)

Handling Different Response Types

def handle_response(response):
    for msg in response.messages:
        system_msg = msg.system_message
        
        # Text response
        if system_msg.text:
            print("Text:", system_msg.text.parts)
        
        # Schema information
        elif system_msg.schema:
            print("Schema resolved:")
            for datasource in system_msg.schema.result.datasources:
                print(f"  Table: {datasource.bigquery_table_reference.table_id}")
        
        # Data query results
        elif system_msg.data:
            if system_msg.data.generated_sql:
                print("SQL:", system_msg.data.generated_sql)
            
            if system_msg.data.result:
                # Convert to pandas DataFrame
                import pandas as pd
                
                fields = [f.name for f in system_msg.data.result.schema.fields]
                rows = [{f: row[f] for f in fields} for row in system_msg.data.result.data]
                df = pd.DataFrame(rows)
                print(df)
        
        # Chart visualization
        elif system_msg.chart:
            import altair as alt
            import json
            
            # Render Vega-Lite chart
            vega_config = system_msg.chart.result.vega_config
            chart = alt.Chart.from_dict(dict(vega_config))
            chart.display()
        
        # Clarification needed
        elif system_msg.clarification:
            print("Please clarify:")
            for question in system_msg.clarification.questions:
                print(f"  {question.question}")
                for option in question.options:
                    print(f"    - {option}")

handle_response(response)

Streaming Responses

stream = chat_client.generate_message_stream(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Analyze sales trends over the past year"
        )
    )],
)

for chunk in stream:
    for msg in chunk.messages:
        if msg.system_message.text:
            print(msg.system_message.text.parts, end="", flush=True)

Inline Context (Stateless)

For one-off queries without creating agents:

chat_client = gda.DataChatServiceClient()

# Define context inline
inline_context = gda.Context(
    system_instruction="You are a data analyst",
    datasource_references=datasource_refs,
)

# Generate response without creating agent or conversation
response = chat_client.generate_message(
    inline_context=inline_context,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="How many airports are in Texas?"
        )
    )],
)

Advanced Options

Python Code Execution

Enable advanced analysis with Python:

context = gda.Context(
    system_instruction="You are a data scientist",
    datasource_references=datasource_refs,
    options=gda.ConversationOptions(
        analysis=gda.AnalysisOptions(
            python=gda.AnalysisOptions.Python(
                enabled=True  # Enable Python for complex calculations
            )
        )
    ),
)

# Agent can now use Python for:
# - Statistical analysis
# - Data transformations
# - Complex calculations
# - Custom visualizations

Custom Visualization Settings

response = chat_client.generate_message(
    conversation=conversation.name,
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Create a line chart of monthly sales with a trend line"
        )
    )],
    visualization_config=gda.VisualizationConfig(
        chart_type="line",
        show_legend=True,
        theme="dark",
    ),
)

Real-World Examples

Sales Analytics Agent

from google.cloud import geminidataanalytics_v1beta as gda

client = gda.DataAgentServiceClient()
chat_client = gda.DataChatServiceClient()

# Multi-table BigQuery setup
orders_ref = gda.BigQueryTableReference(
    project_id="my-project",
    dataset_id="sales",
    table_id="orders",
)

customers_ref = gda.BigQueryTableReference(
    project_id="my-project",
    dataset_id="sales",
    table_id="customers",
)

# System instruction with business context
system_instruction = """You are a sales analyst for Acme Corp.

When analyzing sales:
- Always segment by customer tier (enterprise, mid-market, smb)
- Compare YoY and QoQ trends
- Flag anomalies (>20% deviation from average)
- Include profit margins, not just revenue

Fiscal year starts in April."""

# Create agent
agent = client.create_data_agent(
    parent=f"projects/{PROJECT_ID}/locations/global",
    data_agent_id="sales_analyst",
    data_agent=gda.DataAgent(
        data_analytics_agent=gda.DataAnalyticsAgent(
            published_context=gda.Context(
                system_instruction=system_instruction,
                datasource_references=gda.DatasourceReferences(
                    bq=gda.BigQueryTableReferences(
                        table_references=[orders_ref, customers_ref]
                    )
                ),
            )
        )
    ),
)

Multi-Datasource Analysis

# Combine BigQuery and Looker
bq_ref = gda.BigQueryTableReference(
    project_id="my-project",
    dataset_id="warehouse",
    table_id="raw_events",
)

looker_ref = gda.LookerExploreReference(
    looker_instance_uri="https://company.looker.com",
    lookml_model="business_intelligence",
    explore="sales_metrics",
)

datasource_refs = gda.DatasourceReferences(
    bq=gda.BigQueryTableReferences(table_references=[bq_ref]),
    looker=gda.LookerExploreReferences(explore_references=[looker_ref]),
)

agent = client.create_data_agent(
    parent=f"projects/{PROJECT_ID}/locations/global",
    data_agent=gda.DataAgent(
        data_analytics_agent=gda.DataAnalyticsAgent(
            published_context=gda.Context(
                system_instruction="""You have access to both raw event data (BigQuery)
                and business metrics (Looker). Use raw data for detailed analysis
                and Looker for standard business metrics.""",
                datasource_references=datasource_refs,
            )
        )
    ),
)

# Query spans both sources
response = chat_client.generate_message(
    inline_context=gda.Context(
        datasource_references=datasource_refs,
    ),
    messages=[gda.Message(
        user_message=gda.UserMessage(
            text="Compare funnel conversion rates from raw events with the aggregated metrics in Looker"
        )
    )],
)

Agent Management

for agent in client.list_data_agents(
    parent=f"projects/{PROJECT_ID}/locations/global"
):
    print(f"Agent: {agent.name}")
    print(f"  Created: {agent.create_time}")

Conversation Management

# List conversations
for conv in chat_client.list_conversations(
    parent=f"projects/{PROJECT_ID}/locations/global"
):
    print(f"Conversation: {conv.name}")

# Get conversation history
conv = chat_client.get_conversation(
    name="projects/my-project/locations/global/conversations/conv_123"
)

for msg in conv.messages:
    if msg.user_message:
        print(f"User: {msg.user_message.text}")
    elif msg.system_message:
        print(f"Agent: {msg.system_message.text.parts}")

# Delete conversation
chat_client.delete_conversation(
    name="projects/my-project/locations/global/conversations/conv_123"
)

Best Practices

Provide Business Context

The more context you provide, the better the SQL generation:

Define business terms in glossary
Provide example queries for common patterns
Explain relationships between tables
Document sample values for enum fields

Use Example Queries

Golden queries dramatically improve accuracy:

example_queries = [
    gda.ExampleQuery(
        natural_language_question="Show completed orders",
        sql_query="SELECT * FROM orders WHERE status = 'complete'",
    ),
]

Leverage Conversations

Use stateful conversations for multi-turn analysis:

Create conversation per analysis session
Use follow-up questions to refine
Delete conversations when done

Handle Clarifications

When the agent asks for clarification, provide specific options:

if system_msg.clarification:
    # Present options to user
    # Send clarification response
    response = chat_client.generate_message(
        conversation=conv.name,
        messages=[gda.Message(
            user_message=gda.UserMessage(
                text="Use the orders table"
            )
        )],
    )

Monitor API Usage

Track usage through Cloud Monitoring:

Request counts
Latency metrics
Error rates
Token consumption

Limitations

Pre-GA product: Features and APIs may change
SQL complexity: Very complex joins may require refinement
Looker private instances: Require Service Directory setup
Rate limits: Subject to standard Vertex AI quotas

Resources

API Documentation

Complete API reference

HTTP API Example

REST API notebook

SDK Example

Python SDK notebook

Provide Feedback

Share your experience

Next Steps

Agent Engine

Deploy production agents

BigQuery

Learn about BigQuery

Looker

Looker documentation

Vertex AI

Explore Vertex AI

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

​Overview

​Architecture

​Quick Start

​Data Sources

​BigQuery Tables

​Looker Explores

​Looker Studio

​System Instructions

​Basic Instruction

​Advanced YAML Template

​Conversation Patterns

​Stateful Conversations

​Handling Different Response Types

​Streaming Responses

​Inline Context (Stateless)

​Advanced Options

​Python Code Execution

​Custom Visualization Settings

​Real-World Examples

​Sales Analytics Agent

​Multi-Datasource Analysis

​Agent Management

​Conversation Management

​Best Practices

​Limitations

​Resources

API Documentation

HTTP API Example

SDK Example

Provide Feedback

​Next Steps

Agent Engine

BigQuery

Looker

Vertex AI

Build docs developers (and LLMs) love

Overview

Architecture

Quick Start

Data Sources

BigQuery Tables

Looker Explores

Looker Studio

System Instructions

Basic Instruction

Advanced YAML Template

Conversation Patterns

Stateful Conversations

Handling Different Response Types

Streaming Responses

Inline Context (Stateless)

Advanced Options

Python Code Execution

Custom Visualization Settings

Real-World Examples

Sales Analytics Agent

Multi-Datasource Analysis

Agent Management

Conversation Management

Best Practices

Limitations

Resources

Next Steps