Vertex AI Agent Engine

Overview

Vertex AI Agent Engine is a fully managed platform for deploying AI agents at scale. It handles infrastructure, scaling, and operational complexity so you can focus on building agent logic. Key features:

Automatic scaling from zero to millions of requests
Built-in Memory Bank for persistent agent memory
Support for ADK, LangGraph, and custom frameworks
Terraform deployment automation
Integrated monitoring and logging

Agent Engine includes a free Express Mode for 90 days with no billing account required—perfect for learning and prototyping.

Deployment Methods

Agent Engine supports three deployment approaches:

Express Mode

Free & Fast

No billing account for 90 days
Simple API key authentication
Deploy with ADK CLI
Perfect for learning

Agent Object

Interactive Development

Create agents in notebooks
Direct deployment from code
Ideal for experimentation
Requires Cloud Storage bucket

Inline Source

Production CI/CD

Deploy from source files
Version control friendly
Terraform compatible
No Cloud Storage needed

Express Mode Deployment

The fastest way to deploy your first agent:

Get API Key

Sign up at console.cloud.google.com/expressmode
Navigate to APIs & Services > Credentials
Copy your Generative Language API Key

Install ADK

pip install google-adk

Create Agent

adk create my_agent --api_key=YOUR_API_KEY

This creates a directory with:

agent.py - Agent definition
requirements.txt - Dependencies
.adk/config.json - Configuration

Deploy

adk deploy agent_engine my_agent

Deployment takes 5-10 minutes. You’ll receive a resource name like:

projects/123.../locations/us-central1/reasoningEngines/456...

Query Your Agent

import vertexai

client = vertexai.Client(api_key=api_key)
agent = client.agent_engines.get(name=agent_resource_name)

async for item in agent.async_stream_query(
    message="What are the latest AI announcements from Google?",
    user_id="demo_user",
):
    if "content" in item and item["content"]:
        for part in item["content"]["parts"]:
            if "text" in part:
                print(part["text"], end="", flush=True)

Agent Object Deployment

Deploy agents created in notebooks or Python scripts:

from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from vertexai import agent_engines

# Create agent in memory
agent = LlmAgent(
    name="search_agent",
    model="gemini-2.5-flash",
    description="A production agent that can search the web",
    instruction="Use Google Search for fresh information. Cite sources.",
    tools=[google_search],
)

# Wrap in AdkApp for deployment
adk_app = agent_engines.AdkApp(
    agent=agent,
    enable_tracing=True,
)

Inline Source Deployment

Deploy agents from source files for CI/CD pipelines:

Project Structure

my_agent/
├── agent_package/
│   ├── __init__.py
│   └── agent.py          # Agent definition
├── deployment/
│   └── deploy.py         # AdkApp wrapper
└── requirements.txt

Define Agent

agent_package/agent.py

from google.adk.agents import LlmAgent
from google.adk.tools import google_search

root_agent = LlmAgent(
    name="academic_research",
    model="gemini-2.5-flash",
    description="Answer academic research questions",
    instruction="Search for scholarly information",
    tools=[google_search],
)

Create Deployment Wrapper

deployment/deploy.py

from vertexai import agent_engines
from agent_package.agent import root_agent

adk_app = agent_engines.AdkApp(
    agent=root_agent,
    enable_tracing=True,
)

Deploy

import vertexai

client = vertexai.Client(
    project="your-project",
    location="us-central1"
)

agent = client.agent_engines.create(
    config={
        "display_name": "Academic Research Agent",
        "source_packages": ["agent_package", "deployment", "requirements.txt"],
        "entrypoint_module": "deployment.deploy",
        "entrypoint_object": "adk_app",
        "class_methods": [
            {
                "name": "async_stream_query",
                "api_mode": "async_stream",
                "description": "Stream responses",
            },
            {
                "name": "async_create_session",
                "api_mode": "async",
                "description": "Create session",
            },
        ],
    },
)

Memory Bank Integration

Memory Bank provides persistent, context-aware memory for agents:

What is Memory Bank?

Memory Bank is a managed service that gives agents the ability to:

Remember user preferences and context across sessions
Consolidate information like a human brain during sleep
Retrieve relevant memories for personalized interactions
Scale to millions of users with automatic memory management

Memory Bank uses Gemini models to automatically extract, consolidate, and retrieve memories from conversations—no manual memory management required.

Creating an Agent with Memory Bank

import vertexai
from vertexai import types

# Configuration aliases
MemoryBankConfig = types.ReasoningEngineContextSpecMemoryBankConfig
SimilaritySearchConfig = types.ReasoningEngineContextSpecMemoryBankConfigSimilaritySearchConfig
GenerationConfig = types.ReasoningEngineContextSpecMemoryBankConfigGenerationConfig

client = vertexai.Client(project=PROJECT_ID, location=LOCATION)

# Create Memory Bank configuration
memory_config = MemoryBankConfig(
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
)

# Create Agent Engine with Memory Bank
agent_engine = client.agent_engines.create(
    config={"context_spec": {"memory_bank_config": memory_config}}
)

Memory Retrieval Methods

Scope-Based

Get all memories for a userUse when:

Building user profiles
Displaying preference dashboards
Small number of memories

results = memories.retrieve(
    scope={"user_id": user_id}
)

Similarity Search

Get relevant memories for specific questionsUse when:

Answering targeted questions
User has many memories
Need fast, focused responses

results = memories.retrieve(
    scope={"user_id": user_id},
    similarity_search_params={
        "search_query": "dietary needs?",
        "top_k": 3,
    },
)

Terraform Deployment

Automate infrastructure and agent deployment:

module "agent_engine_project" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 14.0"

  name              = "agent-engine-demo"
  billing_account   = var.billing_account
  org_id            = var.org_id
  
  activate_apis = [
    "aiplatform.googleapis.com",
    "cloudaicompanion.googleapis.com",
  ]
}

resource "google_vertex_ai_agent_engine" "demo_agent" {
  project  = module.agent_engine_project.project_id
  location = "us-central1"
  
  display_name = "Demo Agent"
  
  reasoning_engine {
    source_code {
      inline_source {
        source_packages     = ["agent_package"]
        entrypoint_module   = "agent_package.agent"
        entrypoint_object   = "root_agent"
      }
    }
  }
}

Multi-Agent Systems

Deploy orchestrated multi-agent architectures:

Multi-Agent with Claude

from google.adk.agents import LlmAgent
from google.adk.runners import RunnerContext

# Create specialized agents
research_agent = LlmAgent(
    name="researcher",
    model="claude-4-sonnet@20250514",
    instruction="Search and synthesize research",
)

writing_agent = LlmAgent(
    name="writer",
    model="gemini-2.5-flash",
    instruction="Create polished content",
)

# Orchestrator
root_agent = LlmAgent(
    name="orchestrator",
    model="gemini-2.5-flash",
    instruction="Route tasks to specialized agents",
    agents=[research_agent, writing_agent],
)

# Deploy
adk_app = agent_engines.AdkApp(agent=root_agent)
remote = client.agent_engines.create(agent=adk_app, config=deployment_config)

Monitoring and Observability

Cloud Logging

All agent requests automatically logged to Cloud Logging

gcloud logging read \
  "resource.type=vertex_ai_agent_engine"

Cloud Trace

Enable tracing for performance monitoring

adk_app = agent_engines.AdkApp(
    agent=agent,
    enable_tracing=True,
)

Custom Metrics

Export custom metrics to Cloud MonitoringTrack:

Request latency
Token usage
Error rates
Memory operations

Audit Logs

Compliance-ready audit trails

Who deployed what
Configuration changes
Access patterns

Best Practices

Use Express Mode for Learning

Start with Express Mode to understand Agent Engine without billing setup

Version Control Your Agents

Use Inline Source deployment for production to maintain agent code in Git

Implement Sessions

Always create sessions for stateful conversations with Memory Bank

Monitor Token Usage

Track Gemini token consumption through Cloud Monitoring for cost optimization

Use Similarity Search

For users with extensive history, use similarity search instead of retrieving all memories

Next Steps

ADK Documentation

Learn how to build agents with the Agent Development Kit

Memory Bank Guide

Deep dive into Memory Bank capabilities

Terraform Examples

Infrastructure-as-code templates

Multi-Agent Patterns

Build collaborative agent systems

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

Overview

Deployment Methods

Express Mode

Agent Object

Inline Source

Express Mode Deployment

Agent Object Deployment

Inline Source Deployment

Memory Bank Integration

What is Memory Bank?

Creating an Agent with Memory Bank

Memory Retrieval Methods

Scope-Based

Similarity Search

Terraform Deployment

Multi-Agent Systems

Monitoring and Observability

Cloud Logging

Cloud Trace

Custom Metrics

Audit Logs

Best Practices

Next Steps

ADK Documentation

Memory Bank Guide

Terraform Examples

Multi-Agent Patterns

Build docs developers (and LLMs) love

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

​Overview

​Deployment Methods

Express Mode

Agent Object

Inline Source

​Express Mode Deployment

​Agent Object Deployment

​Inline Source Deployment

​Memory Bank Integration

​What is Memory Bank?

​Creating an Agent with Memory Bank

​Memory Retrieval Methods

Scope-Based

Similarity Search

​Terraform Deployment

​Multi-Agent Systems

​Monitoring and Observability

Cloud Logging

Cloud Trace

Custom Metrics

Audit Logs

​Best Practices

​Next Steps

ADK Documentation

Memory Bank Guide

Terraform Examples

Multi-Agent Patterns

Build docs developers (and LLMs) love

Overview

Deployment Methods

Express Mode Deployment

Agent Object Deployment

Inline Source Deployment

Memory Bank Integration

What is Memory Bank?

Creating an Agent with Memory Bank

Memory Retrieval Methods

Terraform Deployment

Multi-Agent Systems

Monitoring and Observability

Best Practices

Next Steps