Operator components
The operator consists of several key components:Manager
The main process that orchestrates all controllers and webhooks. It handles leader election, health checks, and metrics serving.
Controllers
Independent reconciliation loops for each custom resource type. Each controller watches its resource and maintains desired state.
Webhooks
Admission webhooks that validate and mutate custom resources before they are persisted to etcd.
Renderer
Template engine that generates Kubernetes manifests from custom resource specifications.
Controller architecture
Each custom resource has a dedicated controller that implements the reconciliation loop:Available controllers
The operator includes these controllers:- NIMServiceReconciler - Manages NIM inference services (cmd/main.go:201)
- NIMCacheReconciler - Handles model caching jobs (cmd/main.go:192)
- NIMPipelineReconciler - Orchestrates NIM service pipelines (cmd/main.go:213)
- NemoCustomizerReconciler - Manages model customization services (cmd/main.go:278)
- NemoGuardrailReconciler - Controls guardrail services (cmd/main.go:230)
- NemoEvaluatorReconciler - Handles evaluation services (cmd/main.go:242)
- NemoDatastoreReconciler - Manages datastore services (cmd/main.go:266)
- NemoEntitystoreReconciler - Controls entitystore services (cmd/main.go:254)
Reconciliation loop
The reconciliation loop is the core of the operator’s functionality:Platform abstraction
The operator supports multiple inference platforms through an abstraction layer:- Standalone
- KServe
Direct Kubernetes deployments with:
- Standard Deployment or LeaderWorkerSet resources
- Native Kubernetes Services and Ingress
- Direct GPU resource management
internal/controller/platform/standalone/inferencePlatform field in NIMService (defaults to “standalone”).
Resource management
The operator manages these types of Kubernetes resources:Core workload resources
Core workload resources
- Deployment - Standard single-node NIM deployments
- LeaderWorkerSet - Multi-node deployments with MPI coordination
- StatefulSet - Stateful services requiring stable identities
- Job - One-time tasks like model caching
Networking resources
Networking resources
- Service - ClusterIP, LoadBalancer, or NodePort services
- Ingress - HTTP/HTTPS ingress rules
- HTTPRoute/GRPCRoute - Gateway API routes
Storage resources
Storage resources
- PersistentVolumeClaim - Model storage and caching
- ConfigMap - Configuration data
- Secret - Sensitive credentials and keys
RBAC resources
RBAC resources
- ServiceAccount - Service identity
- Role/RoleBinding - Permissions for service accounts
- SecurityContextConstraints - OpenShift security policies
Scaling and monitoring
Scaling and monitoring
- HorizontalPodAutoscaler - Automatic scaling based on metrics
- ServiceMonitor - Prometheus metrics collection
Webhook validation
Admission webhooks validate custom resources before they are created or updated:- Required field validation
- Cross-field validation rules (e.g., replicas vs autoscaling)
- Immutability constraints (e.g., DRA resources)
- Default value injection
Conditions and status
The operator uses conditions to communicate resource state:High availability
The operator supports high availability through:- Leader election - Only one active manager instance reconciles resources
- Lease-based coordination - Using Kubernetes lease resources
- Fast failover - New leader elected quickly when current leader fails
LeaderElection: true and LeaderElectionID: "a0715c6e.nvidia.com"
Metrics and observability
The operator exposes metrics on port 8080 (configurable):Metrics include controller queue depth, reconciliation duration, and error rates for monitoring operator health.
/healthz- Liveness probe/readyz- Readiness probe
Filtered caching
To reduce memory usage, the operator uses filtered caching for resources:Next steps
Custom resources
Explore all available custom resource definitions
Deployment guide
Learn how to deploy the operator