Overview
Vertex AI Agent Engine is a fully managed platform for deploying AI agents at scale. It handles infrastructure, scaling, and operational complexity so you can focus on building agent logic.
Key features:
Automatic scaling from zero to millions of requests
Built-in Memory Bank for persistent agent memory
Support for ADK, LangGraph, and custom frameworks
Terraform deployment automation
Integrated monitoring and logging
Agent Engine includes a free Express Mode for 90 days with no billing account required—perfect for learning and prototyping.
Deployment Methods
Agent Engine supports three deployment approaches:
Express Mode Free & Fast
No billing account for 90 days
Simple API key authentication
Deploy with ADK CLI
Perfect for learning
Agent Object Interactive Development
Create agents in notebooks
Direct deployment from code
Ideal for experimentation
Requires Cloud Storage bucket
Inline Source Production CI/CD
Deploy from source files
Version control friendly
Terraform compatible
No Cloud Storage needed
Express Mode Deployment
The fastest way to deploy your first agent:
Create Agent
adk create my_agent --api_key=YOUR_API_KEY
This creates a directory with:
agent.py - Agent definition
requirements.txt - Dependencies
.adk/config.json - Configuration
Deploy
adk deploy agent_engine my_agent
Deployment takes 5-10 minutes. You’ll receive a resource name like: projects/123.../locations/us-central1/reasoningEngines/456...
Query Your Agent
import vertexai
client = vertexai.Client( api_key = api_key)
agent = client.agent_engines.get( name = agent_resource_name)
async for item in agent.async_stream_query(
message = "What are the latest AI announcements from Google?" ,
user_id = "demo_user" ,
):
if "content" in item and item[ "content" ]:
for part in item[ "content" ][ "parts" ]:
if "text" in part:
print (part[ "text" ], end = "" , flush = True )
Agent Object Deployment
Deploy agents created in notebooks or Python scripts:
Create Agent
Deploy to Agent Engine
Use Deployed Agent
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from vertexai import agent_engines
# Create agent in memory
agent = LlmAgent(
name = "search_agent" ,
model = "gemini-2.5-flash" ,
description = "A production agent that can search the web" ,
instruction = "Use Google Search for fresh information. Cite sources." ,
tools = [google_search],
)
# Wrap in AdkApp for deployment
adk_app = agent_engines.AdkApp(
agent = agent,
enable_tracing = True ,
)
Inline Source Deployment
Deploy agents from source files for CI/CD pipelines:
Project Structure
my_agent/
├── agent_package/
│ ├── __init__.py
│ └── agent.py # Agent definition
├── deployment/
│ └── deploy.py # AdkApp wrapper
└── requirements.txt
Define Agent
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
root_agent = LlmAgent(
name = "academic_research" ,
model = "gemini-2.5-flash" ,
description = "Answer academic research questions" ,
instruction = "Search for scholarly information" ,
tools = [google_search],
)
Create Deployment Wrapper
from vertexai import agent_engines
from agent_package.agent import root_agent
adk_app = agent_engines.AdkApp(
agent = root_agent,
enable_tracing = True ,
)
Deploy
import vertexai
client = vertexai.Client(
project = "your-project" ,
location = "us-central1"
)
agent = client.agent_engines.create(
config = {
"display_name" : "Academic Research Agent" ,
"source_packages" : [ "agent_package" , "deployment" , "requirements.txt" ],
"entrypoint_module" : "deployment.deploy" ,
"entrypoint_object" : "adk_app" ,
"class_methods" : [
{
"name" : "async_stream_query" ,
"api_mode" : "async_stream" ,
"description" : "Stream responses" ,
},
{
"name" : "async_create_session" ,
"api_mode" : "async" ,
"description" : "Create session" ,
},
],
},
)
Memory Bank Integration
Memory Bank provides persistent, context-aware memory for agents:
What is Memory Bank?
Memory Bank is a managed service that gives agents the ability to:
Remember user preferences and context across sessions
Consolidate information like a human brain during sleep
Retrieve relevant memories for personalized interactions
Scale to millions of users with automatic memory management
Memory Bank uses Gemini models to automatically extract, consolidate, and retrieve memories from conversations—no manual memory management required.
Creating an Agent with Memory Bank
Setup Memory Bank
Store Conversation
Generate Memories
Retrieve Memories
import vertexai
from vertexai import types
# Configuration aliases
MemoryBankConfig = types.ReasoningEngineContextSpecMemoryBankConfig
SimilaritySearchConfig = types.ReasoningEngineContextSpecMemoryBankConfigSimilaritySearchConfig
GenerationConfig = types.ReasoningEngineContextSpecMemoryBankConfigGenerationConfig
client = vertexai.Client( project = PROJECT_ID , location = LOCATION )
# Create Memory Bank configuration
memory_config = MemoryBankConfig(
similarity_search_config = SimilaritySearchConfig(
embedding_model = f "projects/ { PROJECT_ID } /locations/ { LOCATION } /publishers/google/models/text-embedding-005"
),
generation_config = GenerationConfig(
model = f "projects/ { PROJECT_ID } /locations/ { LOCATION } /publishers/google/models/gemini-2.5-flash"
),
)
# Create Agent Engine with Memory Bank
agent_engine = client.agent_engines.create(
config = { "context_spec" : { "memory_bank_config" : memory_config}}
)
Memory Retrieval Methods
Scope-Based Get all memories for a userUse when:
Building user profiles
Displaying preference dashboards
Small number of memories
results = memories.retrieve(
scope = { "user_id" : user_id}
)
Similarity Search Get relevant memories for specific questionsUse when:
Answering targeted questions
User has many memories
Need fast, focused responses
results = memories.retrieve(
scope = { "user_id" : user_id},
similarity_search_params = {
"search_query" : "dietary needs?" ,
"top_k" : 3 ,
},
)
Automate infrastructure and agent deployment:
module "agent_engine_project" {
source = "terraform-google-modules/project-factory/google"
version = "~> 14.0"
name = "agent-engine-demo"
billing_account = var . billing_account
org_id = var . org_id
activate_apis = [
"aiplatform.googleapis.com" ,
"cloudaicompanion.googleapis.com" ,
]
}
resource "google_vertex_ai_agent_engine" "demo_agent" {
project = module . agent_engine_project . project_id
location = "us-central1"
display_name = "Demo Agent"
reasoning_engine {
source_code {
inline_source {
source_packages = [ "agent_package" ]
entrypoint_module = "agent_package.agent"
entrypoint_object = "root_agent"
}
}
}
}
Multi-Agent Systems
Deploy orchestrated multi-agent architectures:
from google.adk.agents import LlmAgent
from google.adk.runners import RunnerContext
# Create specialized agents
research_agent = LlmAgent(
name = "researcher" ,
model = "claude-4-sonnet@20250514" ,
instruction = "Search and synthesize research" ,
)
writing_agent = LlmAgent(
name = "writer" ,
model = "gemini-2.5-flash" ,
instruction = "Create polished content" ,
)
# Orchestrator
root_agent = LlmAgent(
name = "orchestrator" ,
model = "gemini-2.5-flash" ,
instruction = "Route tasks to specialized agents" ,
agents = [research_agent, writing_agent],
)
# Deploy
adk_app = agent_engines.AdkApp( agent = root_agent)
remote = client.agent_engines.create( agent = adk_app, config = deployment_config)
Monitoring and Observability
Cloud Logging All agent requests automatically logged to Cloud Logging gcloud logging read \
"resource.type=vertex_ai_agent_engine"
Cloud Trace Enable tracing for performance monitoring adk_app = agent_engines.AdkApp(
agent = agent,
enable_tracing = True ,
)
Custom Metrics Export custom metrics to Cloud Monitoring Track:
Request latency
Token usage
Error rates
Memory operations
Audit Logs Compliance-ready audit trails
Who deployed what
Configuration changes
Access patterns
Best Practices
Use Express Mode for Learning
Start with Express Mode to understand Agent Engine without billing setup
Version Control Your Agents
Use Inline Source deployment for production to maintain agent code in Git
Implement Sessions
Always create sessions for stateful conversations with Memory Bank
Monitor Token Usage
Track Gemini token consumption through Cloud Monitoring for cost optimization
Use Similarity Search
For users with extensive history, use similarity search instead of retrieving all memories
Next Steps
ADK Documentation Learn how to build agents with the Agent Development Kit
Memory Bank Guide Deep dive into Memory Bank capabilities
Terraform Examples Infrastructure-as-code templates
Multi-Agent Patterns Build collaborative agent systems