Overview
Vespa Cloud is a fully managed service for deploying and operating Vespa applications. It provides automated infrastructure management, continuous deployment, monitoring, and scaling without the operational overhead of self-hosting.
Getting Started
Prerequisites
Create Vespa Cloud Account
Install Vespa CLI
# macOS
brew install vespa-cli
# Linux
curl -fsSL https://cli.vespa.ai/install.sh | bash
Create Your First Application
# Create application directory
mkdir my-app && cd my-app
# Create minimal application package
mkdir -p schemas
cat > services.xml << 'EOF'
<?xml version="1.0" encoding="utf-8" ?>
<services version="1.0">
<container id="default" version="1.0">
<search/>
<document-api/>
</container>
<content id="music" version="1.0">
<redundancy>2</redundancy>
<documents>
<document type="music" mode="index"/>
</documents>
<nodes count="2">
<resources vcpu="2" memory="8Gb" disk="50Gb"/>
</nodes>
</content>
</services>
EOF
cat > schemas/music.sd << 'EOF'
schema music {
document music {
field title type string {
indexing: index | summary
}
}
}
EOF
Deploy to Cloud
# Configure cloud target
vespa config set target cloud
vespa config set application my-tenant.my-app.default
# Deploy application
vespa deploy --wait 300
Application Structure
deployment.xml
For Vespa Cloud, add a deployment.xml file to define deployment pipeline:
<? xml version = "1.0" encoding = "utf-8" ?>
< deployment version = "1.0" major-version = "8" >
<!-- Automated testing environment -->
< test />
<!-- Staging environment for validation -->
< staging />
<!-- Production deployment -->
< prod >
< region active = "true" > aws-us-east-1c </ region >
< region active = "true" > aws-eu-west-1a </ region >
</ prod >
</ deployment >
Multi-Region Deployment
Deploy across multiple geographic regions:
< deployment version = "1.0" >
< test />
< staging />
< prod >
<!-- Primary region -->
< region active = "true" > aws-us-east-1c </ region >
<!-- Secondary regions for global reach -->
< region active = "true" > aws-eu-west-1a </ region >
< region active = "true" > aws-ap-southeast-1a </ region >
</ prod >
</ deployment >
Multi-region deployments provide low-latency access globally and improve availability.
Deployment Pipeline
Automated Testing
Vespa Cloud automatically runs tests during deployment:
Test Environment
Application deployed to isolated test environment
System Tests
Automated system tests run against test deployment
Staging
Application deployed to staging with production-like setup
Staging Tests
Automated staging tests validate behavior
Production
Application rolled out to production regions
Progressive Rollout
Control deployment rollout with parallel and serial steps:
< deployment version = "1.0" >
< test />
< staging />
< prod >
<!-- Deploy to first region -->
< region active = "true" > aws-us-east-1c </ region >
<!-- Wait before continuing -->
< delay hours = "2" />
<!-- Deploy remaining regions in parallel -->
< parallel >
< region active = "true" > aws-eu-west-1a </ region >
< region active = "true" > aws-ap-southeast-1a </ region >
</ parallel >
</ prod >
</ deployment >
Deployment Blocking
Prevent deployments during specific time windows:
< deployment version = "1.0" >
<!-- Block deployments during business hours -->
< block-change
revision = "true"
version = "false"
days = "mon-fri"
hours = "9-17"
time-zone = "America/New_York" />
< test />
< staging />
< prod >
< region active = "true" > aws-us-east-1c </ region >
</ prod >
</ deployment >
Resource Specification
Node Resources
Specify compute resources for each cluster:
< services version = "1.0" >
< container id = "query" version = "1.0" >
< search />
<!-- Query cluster with autoscaling -->
< nodes count = "[2,8]" >
< resources vcpu = "4" memory = "16Gb" disk = "100Gb" disk-speed = "fast" />
</ nodes >
</ container >
< content id = "documents" version = "1.0" >
< redundancy > 2 </ redundancy >
< documents >
< document type = "doc" mode = "index" />
</ documents >
<!-- Content cluster with fixed size -->
< nodes count = "6" groups = "3" >
< resources vcpu = "8" memory = "32Gb" disk = "500Gb" disk-speed = "fast" storage-type = "local" />
</ nodes >
</ content >
</ services >
Autoscaling
Enable autoscaling with range notation:
< container id = "query" version = "1.0" >
< search />
<!-- Autoscale between 3 and 10 nodes -->
< nodes count = "[3,10]" >
< resources vcpu = "4" memory = "16Gb" disk = "100Gb" />
</ nodes >
</ container >
Vespa Cloud automatically scales based on traffic patterns, CPU usage, and query latency.
Resource Attributes
Available Resource Attributes
vcpu : Virtual CPU cores (e.g., 4, 8, 16)
memory : RAM allocation (e.g., 8Gb, 32Gb, 64Gb)
disk : Storage size (e.g., 100Gb, 500Gb, 1800Gb)
disk-speed : fast (SSD) or any
storage-type : local (instance storage) or remote (network storage)
Environment-Specific Configuration
Multiple Instances
Define multiple application instances:
< deployment version = "1.0" >
<!-- Development instance -->
< instance id = "dev" >
< test />
< prod >
< region active = "true" > aws-us-east-1c </ region >
</ prod >
</ instance >
<!-- Production instance -->
< instance id = "prod" >
< test />
< staging />
< prod >
< region active = "true" > aws-us-east-1c </ region >
< region active = "true" > aws-eu-west-1a </ region >
</ prod >
</ instance >
</ deployment >
Instance-Specific Services
Configure services per instance:
< services version = "1.0" xmlns:deploy = "vespa" >
< container id = "query" version = "1.0" >
< search />
<!-- Dev instance: 1 small node -->
< nodes count = "1" deploy:instance = "dev" >
< resources vcpu = "2" memory = "8Gb" disk = "50Gb" />
</ nodes >
<!-- Prod instance: autoscaling cluster -->
< nodes count = "[3,10]" deploy:instance = "prod" >
< resources vcpu = "8" memory = "32Gb" disk = "100Gb" disk-speed = "fast" />
</ nodes >
</ container >
</ services >
Deployment Methods
Using Vespa CLI
The recommended deployment method:
# Deploy current directory
vespa deploy --wait 300
# Deploy specific application package
vespa deploy my-app/ --wait 300
# Deploy to specific instance
vespa config set application my-tenant.my-app.prod
vespa deploy --wait 300
Using Maven Plugin
Integrate with Maven builds:
< project >
< properties >
< vespa.version > 8.123.45 </ vespa.version >
</ properties >
< build >
< plugins >
< plugin >
< groupId > com.yahoo.vespa </ groupId >
< artifactId > vespa-maven-plugin </ artifactId >
< version > ${vespa.version} </ version >
< configuration >
< tenant > my-tenant </ tenant >
< application > my-app </ application >
< instance > default </ instance >
</ configuration >
</ plugin >
</ plugins >
</ build >
</ project >
Deploy with Maven:
# Compile and deploy
mvn package vespa:deploy
# Submit deployment job
mvn vespa:submit
Using GitHub Actions
Automate deployments with CI/CD:
.github/workflows/deploy.yml
name : Deploy to Vespa Cloud
on :
push :
branches : [ main ]
jobs :
deploy :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v3
- name : Deploy to Vespa Cloud
uses : vespa-engine/vespa-github-actions/deploy@v1
with :
tenant : my-tenant
application : my-app
instance : prod
api-key : ${{ secrets.VESPA_API_KEY }}
wait : 'true'
API Keys and Authentication
Creating API Keys
Generate API keys for programmatic access:
# Create API key via Vespa Cloud Console
# Or use CLI
vespa auth api-key
Using API Keys
Authenticate using API keys:
# Set API key in environment
export VESPA_CLI_API_KEY = "your-api-key"
# Deploy with API key
vespa deploy --wait 300
Never commit API keys to version control. Use environment variables or secrets management.
Monitoring and Observability
Built-in Monitoring
Vespa Cloud provides comprehensive monitoring:
Query latency and throughput
Document feed rates and errors
Resource utilization (CPU, memory, disk)
Application-specific metrics
Access metrics via Vespa Cloud Console or API.
Custom Metrics
Expose custom metrics from your application:
import com.yahoo.jdisc.Metric;
public class MyComponent {
private final Metric metric ;
public MyComponent ( Metric metric ) {
this . metric = metric;
}
public void processRequest () {
metric . add ( "my_custom_metric" , 1 , null );
}
}
Logs
Access application logs:
# Tail logs
vespa log --follow
# Filter by log level
vespa log --level warning
# Query historical logs
vespa log --from 1h
Security
mTLS Authentication
Vespa Cloud uses mutual TLS for secure communication:
# CLI automatically handles certificates
vespa query 'select * from sources * where title contains "hello"'
Data Plane Access
Configure data plane authentication:
< services version = "1.0" >
< container id = "default" version = "1.0" >
< search />
< document-api />
<!-- Enable mTLS for data plane -->
< clients >
< client id = "my-client" >
< certificate file = "client-cert.pem" />
</ client >
</ clients >
</ container >
</ services >
Cost Optimization
Right-Size Resources
Monitor resource usage and adjust vcpu/memory allocations accordingly
Use Autoscaling
Enable autoscaling for variable workloads to avoid over-provisioning
Optimize Redundancy
Use redundancy="2" for development, redundancy="3" only when necessary
Leverage Searchable Copies
Set searchable-copies lower than redundancy to reduce indexing costs
Migration from Self-Hosted
Preparing Application Package
Update your application for Vespa Cloud:
Add deployment.xml
Create deployment specification for cloud environments
Update Resource Specifications
Replace host-based configuration with resource specifications
Remove hosts.xml
Vespa Cloud manages hosts automatically
Test Locally
Validate changes with local deployment
Data Migration
Migrate data to Vespa Cloud:
# Export data from self-hosted
vespa visit --target local > documents.jsonl
# Import to Vespa Cloud
vespa feed documents.jsonl --target cloud
Best Practices
Start with Test Environment
Always deploy to test/staging before production
Use Version Control
Keep application packages in Git for traceability
Implement CI/CD
Automate deployments with GitHub Actions or similar tools
Monitor Deployments
Watch metrics during and after deployments to catch issues early
Plan for Global Distribution
Deploy to multiple regions for low latency and high availability
Troubleshooting
Deployment Failures
Check deployment status:
Application Not Responding
Verify endpoints:
High Latency
Analyze query performance:
vespa query 'select * from sources * where true' \
--timeout 5 \
--trace-level 5