Overview

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

What is the operator pattern?
NIM and NeMo microservices
NVIDIA Inference Microservices (NIMs)
NVIDIA NeMo Microservices
Operator responsibilities
Key benefits
How it works
Next steps

The NVIDIA NIM Operator is a Kubernetes operator that automates the deployment and management of NVIDIA Inference Microservices (NIMs) and NVIDIA NeMo microservices. It extends Kubernetes with custom resources and controllers that handle the full lifecycle of AI inference workloads.

What is the operator pattern?

Kubernetes operators are software extensions that use custom resources to manage applications and their components. They follow the controller pattern:

Observe - Watch for changes to custom resources
Analyze - Compare current state with desired state
Act - Take actions to reconcile differences

The NIM Operator implements this pattern to automate complex deployment and management tasks for AI inference services.

NIM and NeMo microservices

The operator manages two primary families of microservices:

NVIDIA Inference Microservices (NIMs)

NIMs are optimized containers for serving AI models with:

Pre-built inference engines (TensorRT-LLM, vLLM)
Optimized model profiles for specific GPU configurations
OpenAI-compatible APIs
Built-in model caching and optimization

NVIDIA NeMo Microservices

NeMo microservices provide additional AI capabilities:

Customizer - Fine-tune and customize foundation models
Guardrails - Apply safety controls and content filtering
Evaluator - Evaluate model performance and quality
Datastore - Manage datasets and model artifacts
Entitystore - Store and retrieve entity information

Operator responsibilities

The NIM Operator handles:

Resource lifecycle

Create, update, and delete Kubernetes resources (Deployments, Services, ConfigMaps, etc.) based on custom resource specifications

Model caching

Automate the download and caching of AI models from NVIDIA NGC or other sources to persistent storage

Platform integration

Support multiple inference platforms including standalone deployments and KServe integration

Scaling and monitoring

Configure autoscaling, health checks, and metrics collection for inference services

Storage management

Handle persistent volumes, NIMCache volumes, and storage configurations for models

Network exposure

Set up Services, Ingress, and Gateway routes for accessing inference endpoints

Key benefits

Simplified deployment

Deploy complex AI inference stacks with simple YAML manifests. The operator handles all the underlying Kubernetes resources automatically.

Consistent operations

Apply the same operational patterns across different types of AI services. The operator ensures consistency in how services are deployed, scaled, and monitored.

Platform flexibility

Choose between standalone deployments or KServe integration. Switch platforms by changing a single field in the custom resource.

Production-ready defaults

Get sensible defaults for health checks, resource limits, security contexts, and other production requirements.

Multi-node support

Deploy models requiring multiple GPUs across nodes using LeaderWorkerSet for tensor and pipeline parallelism.

How it works

When you create a NIMService or NeMo microservice custom resource:

The operator’s controller detects the new resource
It validates the specification and sets default values
It creates necessary Kubernetes resources (Deployments, Services, etc.)
It monitors the deployment status and updates the custom resource status
It continuously watches for changes and reconciles state

The operator uses a declarative approach - you describe the desired state in custom resources, and the operator ensures the cluster matches that state.

Next steps

Architecture

Learn about the operator’s internal architecture and components

Custom resources

Explore all available custom resource definitions (CRDs)

Installation

Architecture

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

What is the operator pattern?

NIM and NeMo microservices

NVIDIA Inference Microservices (NIMs)

NVIDIA NeMo Microservices

Operator responsibilities

Resource lifecycle

Model caching

Platform integration

Scaling and monitoring

Storage management

Network exposure

Key benefits

How it works

Next steps

Architecture

Custom resources

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​What is the operator pattern?

​NIM and NeMo microservices

​NVIDIA Inference Microservices (NIMs)

​NVIDIA NeMo Microservices

​Operator responsibilities

Resource lifecycle

Model caching

Platform integration

Scaling and monitoring

Storage management

Network exposure

​Key benefits

​How it works

​Next steps

Architecture

Custom resources

Build docs developers (and LLMs) love

What is the operator pattern?

NIM and NeMo microservices

NVIDIA Inference Microservices (NIMs)

NVIDIA NeMo Microservices

Operator responsibilities

Key benefits

How it works

Next steps