Skip to main content
This guide covers common issues you may encounter with Agones and how to resolve them.

GameServer Issues

GameServer stuck in Creating or Starting state

Symptoms: GameServer never reaches Ready state and remains in Creating or Starting Common causes:
# Check if Pod is pending
kubectl get pods -l agones.dev/gameserver=<gameserver-name>

# Describe the Pod to see scheduling errors
kubectl describe pod <pod-name>
Look for:
  • Insufficient CPU/memory resources
  • Node selector/affinity constraints not met
  • Taints that prevent scheduling
  • PersistentVolume provisioning delays
Solution:
  • Add more nodes to cluster
  • Adjust resource requests/limits
  • Review node selectors and taints
# Check Pod events
kubectl describe pod <pod-name> | grep -A 5 Events
Look for:
  • ImagePullBackOff or ErrImagePull errors
  • Authentication failures to registry
  • Invalid image name or tag
Solution:
  • Verify image name and tag are correct
  • Ensure imagePullSecrets are configured
  • Check registry credentials and permissions
# Check game server container logs
kubectl logs <pod-name> -c <game-server-container>

# Check SDK sidecar logs
kubectl logs <pod-name> -c agones-gameserver-sidecar
Look for:
  • Errors connecting to SDK server
  • SDK Ready() call never made
  • Crashes before reaching Ready state
Solution:
  • Verify SDK integration in game server code
  • Check for crashes or exceptions during startup
  • Enable SDK server debug logging (see below)

GameServer immediately goes to Unhealthy

Symptoms: GameServer reaches Ready but quickly transitions to Unhealthy
1

Check health check configuration

kubectl get gameserver <name> -o yaml | grep -A 10 health
Verify:
  • periodSeconds is reasonable (default: 5s)
  • failureThreshold is not too low (default: 3)
  • initialDelaySeconds allows time for startup
2

Check for container crashes

kubectl describe pod <pod-name>
Look at:
  • Restart count on game server container
  • Last termination reason and exit code
  • Recent events
3

Review health check logs

kubectl logs <pod-name> -c agones-gameserver-sidecar | grep health
Common issues:
  • Health endpoint returning non-200 status
  • Health endpoint not responding in time
  • TCP health check connection refused

GameServer allocation fails

Symptoms: Allocation requests return errors or cannot find available GameServers
# Check for ready GameServers
kubectl get gs -l agones.dev/fleet=<fleet-name>

# View allocation metrics
kubectl port-forward -n agones-system svc/agones-controller 8080:8080
curl http://localhost:8080/metrics | grep allocation
Common causes:
  • No GameServers in Ready state
  • Allocation selectors too restrictive
  • Multi-cluster allocation policy misconfigured
  • Allocation endpoint authentication issues
Solutions:
# Increase Fleet replicas
apiVersion: agones.dev/v1
kind: Fleet
metadata:
  name: my-fleet
spec:
  replicas: 10  # Increase this

# Or add FleetAutoscaler
apiVersion: autoscaling.agones.dev/v1
kind: FleetAutoscaler
metadata:
  name: my-fleet-autoscaler
spec:
  fleetName: my-fleet
  policy:
    type: Buffer
    buffer:
      bufferSize: 5
      minReplicas: 5
      maxReplicas: 20

Fleet and Autoscaling Issues

Fleet not scaling

# Check FleetAutoscaler status
kubectl get fleetautoscaler <name> -o yaml

# Check for errors in Agones controller
kubectl logs -n agones-system -l app=agones,component=controller --tail=100 | grep -i error
Verify:
  • status.ableToScale is true
  • status.scalingLimited indicates if at min/max
  • Buffer policy is reasonable for your workload
Common issues:
# Buffer size too small - autoscaler at min
buffer:
  bufferSize: 1  # Too small
  minReplicas: 5

# Solution: increase buffer
buffer:
  bufferSize: 3  # Better
  minReplicas: 5

Fleet rollout stuck

# Check rollout status
kubectl get fleet <name> -o yaml | grep -A 10 status

# Check GameServerSets
kubectl get gameserverset -l agones.dev/fleet=<fleet-name>
Common causes:
  • New GameServerSet GameServers not becoming Ready
  • RollingUpdate strategy with maxUnavailable too low
  • Resource constraints preventing new GameServers
Solution:
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 25%  # Allow faster rollout
    maxSurge: 25%

Networking Issues

Cannot connect to GameServer from outside cluster

1

Verify port allocation

kubectl get gs <name> -o yaml | grep -A 5 ports
Check:
  • hostPort is populated (in PortPolicy Dynamic/Passthrough)
  • Port is in configured range (default: 7000-8000)
  • Address is external node IP (not internal)
2

Check node firewall rules

For cloud providers:
# GKE
gcloud compute firewall-rules list | grep agones

# AWS
aws ec2 describe-security-groups --filters Name=group-name,Values=*agones*

# Azure
az network nsg rule list --nsg-name <nsg-name> --resource-group <rg>
Ensure firewall allows:
  • Game port range (7000-8000 UDP by default)
  • From your client IP ranges
3

Test connectivity

# Get GameServer connection info
kubectl get gs <name> -o jsonpath='{.status.address}:{.status.ports[0].port}'

# Test UDP connectivity
nc -u <address> <port>

# Or TCP
nc -z <address> <port>

Port conflicts between GameServers

Symptoms: GameServers report port binding errors
kubectl logs <pod-name> -c <game-server-container>
# Look for: "bind: address already in use"
Solution: Use Dynamic port allocation instead of Static:
ports:
- name: default
  portPolicy: Dynamic  # Let Agones assign ports
  containerPort: 7654
  protocol: UDP

Controller and System Issues

Agones controller not running

# Check controller status
kubectl get pods -n agones-system -l app=agones,component=controller

# Check for crash loops
kubectl describe pod -n agones-system <controller-pod>

# View logs
kubectl logs -n agones-system <controller-pod> --tail=100
Common issues:
  • Insufficient RBAC permissions
  • API server connectivity problems
  • Resource limits too low
  • Invalid feature gate configuration

High controller memory usage

# Check resource usage
kubectl top pod -n agones-system -l app=agones,component=controller

# Check controller metrics
kubectl port-forward -n agones-system svc/agones-controller 8080:8080
curl http://localhost:8080/metrics | grep go_memstats
Solutions:
  • Increase controller memory limits
  • Reduce number of namespaces watched
  • Check for memory leaks in logs

Feature flag not working

# Check controller configuration
kubectl logs -n agones-system <controller-pod> --tail=100 | grep -i feature
Look for log line with featureGates showing current configuration:
{"featureGates":"PlayerTracking=true&CountsAndLists=false","message":"starting gameServer operator..."}
To enable feature flags:
helm upgrade agones agones/agones \
  --set agones.featureGates="PlayerTracking=true,CountsAndLists=true" \
  --namespace agones-system

Installation and Upgrade Issues

Helm install fails with RBAC errors

Symptoms: “forbidden: User X cannot create resource Y”
# For GKE, create cluster admin binding
kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole cluster-admin \
  --user $(gcloud config get-value account)

Namespace stuck in Terminating

Symptoms: agones-system namespace won’t delete
1

Get namespace definition

kubectl get namespace agones-system -o json > agones-ns.json
2

Remove finalizers

Edit agones-ns.json and remove the finalizers section:
"spec": {
    "finalizers": []  // Empty this array
}
3

Update namespace via proxy

# Start proxy in one terminal
kubectl proxy

# In another terminal
curl -k -H "Content-Type: application/json" -X PUT \
  --data-binary @agones-ns.json \
  http://127.0.0.1:8001/api/v1/namespaces/agones-system/finalize

GameServers won’t delete

Symptoms: kubectl delete gs hangs, GameServers stuck in Terminating
# Remove finalizers from all GameServers
kubectl get gameserver -o name | xargs -n1 -P1 -I{} \
  kubectl patch {} --type=merge -p '{"metadata": {"finalizers": []}}'

# Force delete
kubectl delete gs --all --force --grace-period=0
Removing finalizers bypasses Agones cleanup logic. Only do this if Agones controller is not running or uninstalled.

Debugging Tools

Enable debug logging

apiVersion: agones.dev/v1
kind: GameServer
metadata:
  name: debug-gameserver
spec:
  sdkServer:
    logLevel: Debug  # Info (default), Debug
  ...

Useful kubectl commands

# Watch GameServer state changes
kubectl get gs -w

# Get events for a GameServer
kubectl get events --field-selector involvedObject.name=<gameserver-name>

# Get all events in namespace
kubectl get events --sort-by='.lastTimestamp'

# Describe all GameServers in fleet
kubectl get gs -l agones.dev/fleet=<fleet-name> -o name | \
  xargs -n1 kubectl describe

# Check controller configuration
kubectl get deployment -n agones-system agones-controller -o yaml | \
  grep -A 20 env:

Test with local SDK server

Run your game server locally against the SDK server:
# Download SDK server
wget https://github.com/googleforgames/agones/releases/download/v1.x.x/agonessdk-server-1.x.x.zip
unzip agonessdk-server-1.x.x.zip

# Run SDK server
./sdk-server.linux.amd64 --local

# In another terminal, run your game server
./my-game-server
This isolates SDK integration issues from Kubernetes/Agones deployment issues.

Multi-Cluster Allocation Issues

Remote allocation failing

# Check allocation policies
kubectl get gameserverallocationpolicy --all-namespaces

# Describe policy
kubectl describe gameserverallocationpolicy <name>

# Check secrets
kubectl get secret <allocation-secret> -o yaml
Common issues:
  • Certificate mismatch or expiration
  • Incorrect allocation endpoints
  • Network connectivity between clusters
  • Firewall blocking allocation port (443)
Test allocation endpoint:
# From cluster A, test connectivity to cluster B allocator
curl -k https://<cluster-b-allocator>:443/healthz

Performance Issues

High allocation latency

# Check allocation metrics
kubectl port-forward -n agones-system svc/agones-controller 8080:8080
curl http://localhost:8080/metrics | grep allocation_duration
Solutions:
  • Increase ready GameServer buffer
  • Use faster storage for etcd
  • Reduce allocation contention with batching
  • Check API server health

Slow GameServer startup

# Measure time in each state
kubectl get events --sort-by='.lastTimestamp' | \
  grep -E "<gameserver-name>|PortAllocation|Creating|Scheduled|Ready"
Optimization strategies:
  • Use smaller container images
  • Pre-pull images on nodes
  • Reduce application startup time
  • Adjust health check initialDelaySeconds

Getting Help

Agones Slack

Join the community Slack workspace

GitHub Issues

Report bugs or request features

Monitoring

Set up monitoring to catch issues early

Best Practices

Follow production deployment guidelines

Collecting Diagnostic Information

When reporting issues, collect:
#!/bin/bash
# Save diagnostic info

DIR=agones-diagnostics-$(date +%Y%m%d-%H%M%S)
mkdir -p $DIR

# Agones version
kubectl get deployment -n agones-system agones-controller \
  -o jsonpath='{.spec.template.spec.containers[0].image}' > $DIR/version.txt

# Controller logs
kubectl logs -n agones-system -l app=agones,component=controller \
  --tail=1000 > $DIR/controller.log

# All GameServers
kubectl get gs --all-namespaces -o yaml > $DIR/gameservers.yaml

# All Fleets
kubectl get fleet --all-namespaces -o yaml > $DIR/fleets.yaml

# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp' \
  > $DIR/events.txt

# Metrics
curl -s http://localhost:8080/metrics > $DIR/metrics.txt

tar czf $DIR.tar.gz $DIR
echo "Diagnostics saved to $DIR.tar.gz"

Build docs developers (and LLMs) love