What is the operator pattern?
Kubernetes operators are software extensions that use custom resources to manage applications and their components. They follow the controller pattern:- Observe - Watch for changes to custom resources
- Analyze - Compare current state with desired state
- Act - Take actions to reconcile differences
NIM and NeMo microservices
The operator manages two primary families of microservices:NVIDIA Inference Microservices (NIMs)
NIMs are optimized containers for serving AI models with:- Pre-built inference engines (TensorRT-LLM, vLLM)
- Optimized model profiles for specific GPU configurations
- OpenAI-compatible APIs
- Built-in model caching and optimization
NVIDIA NeMo Microservices
NeMo microservices provide additional AI capabilities:- Customizer - Fine-tune and customize foundation models
- Guardrails - Apply safety controls and content filtering
- Evaluator - Evaluate model performance and quality
- Datastore - Manage datasets and model artifacts
- Entitystore - Store and retrieve entity information
Operator responsibilities
The NIM Operator handles:Resource lifecycle
Create, update, and delete Kubernetes resources (Deployments, Services, ConfigMaps, etc.) based on custom resource specifications
Model caching
Automate the download and caching of AI models from NVIDIA NGC or other sources to persistent storage
Platform integration
Support multiple inference platforms including standalone deployments and KServe integration
Scaling and monitoring
Configure autoscaling, health checks, and metrics collection for inference services
Storage management
Handle persistent volumes, NIMCache volumes, and storage configurations for models
Network exposure
Set up Services, Ingress, and Gateway routes for accessing inference endpoints
Key benefits
Simplified deployment
Simplified deployment
Deploy complex AI inference stacks with simple YAML manifests. The operator handles all the underlying Kubernetes resources automatically.
Consistent operations
Consistent operations
Apply the same operational patterns across different types of AI services. The operator ensures consistency in how services are deployed, scaled, and monitored.
Platform flexibility
Platform flexibility
Choose between standalone deployments or KServe integration. Switch platforms by changing a single field in the custom resource.
Production-ready defaults
Production-ready defaults
Get sensible defaults for health checks, resource limits, security contexts, and other production requirements.
Multi-node support
Multi-node support
Deploy models requiring multiple GPUs across nodes using LeaderWorkerSet for tensor and pipeline parallelism.
How it works
When you create a NIMService or NeMo microservice custom resource:- The operator’s controller detects the new resource
- It validates the specification and sets default values
- It creates necessary Kubernetes resources (Deployments, Services, etc.)
- It monitors the deployment status and updates the custom resource status
- It continuously watches for changes and reconciles state
The operator uses a declarative approach - you describe the desired state in custom resources, and the operator ensures the cluster matches that state.
Next steps
Architecture
Learn about the operator’s internal architecture and components
Custom resources
Explore all available custom resource definitions (CRDs)