Deployment of AI-Ready Kubernetes
Important
To perform the validation with AI-Ready Kubernetes, you must comply with one of the prerequisites:
- Have an AI Factory ready to be validated; or,
- Configure an AI Factory by following one of these options:
Tools like Kubernetes provide robust orchestration for deploying AI workloads at scale, being able to manage isolation between cluster workloads and GPU resources for AI inference tasks. With the use of NVIDIA GPU Operator, you perform the provision of the necessary NVIDIA drivers and libraries for making GPU resources available to containers. Kubernetes embraces multitenancy, supporting different isolated namespaces where the access from different teams or users are managed with Role Based Access Control (RBAC) and network policies. As an administrator, you can also enforce limits on the GPU usage or other resources consumed per namespace, ensuring fair resource allocation.
Additionally, running Kubernetes clusters on top of OpenNebula-provisioned virtual machines (VMs) provides several advantages, like hardware-level isolation, physical secure multi-tenancy, an additional layer of resource isolation for performance-sensitive workloads, multi-cloud architectures, flexibility as well as lifecycle management, resource efficiency, among others. Use CAPONE, the OpenNebula Cluster API provisioner for Kubernetes, to run K8s clusters over OpenNebula- provisioned infrastructure. This allows you to provision Kubernetes workload clusters in a declarative manner.
In this guide you will learn how to combine all of these components for provisioning a secure, robust and scalable solution for our AI workloads on top of the NVIDIA Dynamo framework powered by the OpenNebula cloud platform.
Important
In this guide, we assume that we are using a single node OpenNebula deployment (i.e. a single node that works as an OpenNebula frontend and hypervisor host at the same time).Architecture
The diagram below depicts the top-level architecture of the NVIDIA Dynamo framework setup in an OpenNebula deployment. The OpenNebula frontend host contains two NVIDIA L40S GPU PCI cards which operate as the host server for the K8s Cluster VMs. For this reference setup, the VMs share a simple bridged network.
To deploy the GPU-enabled Kubernetes workload cluster, first deploy a VM with the OpenNebula “CAPI Service” marketplace appliance which contains a light Kubernetes management cluster based in K3s and includes a Rancher and CAPONE controller deployment. This instance is used to provision the GPU-enabled Kubernetes workload cluster nodes in a declarative way.
The GPU-enabled Kubernetes workload cluster consists of three VMs that operate as a control plane node and two worker nodes. Each worker node uses one of the two NVIDIA L40S GPU PCI cards. Deploy the NVIDIA GPU Operator in this cluster for automatic configuration of the nodes that make the GPU available to its workloads, and the NVIDIA Dynamo framework operator to deploy inference graphs in a declarative manner.
Infrastructure Deployment and Configuration
Here is a brief glossary of the components described in this section:
- Management Cluster: a lightweight Kubernetes cluster that contains Cluster API provider components for creating workload Kubernetes declaratively, based on CRDs and yaml manifests/Helm charts.
- Workload Cluster: the Kubernetes clusters created through the Cluster API that manage the actual workloads.
- CAPI service Appliance: an OpenNebula Cluster API VM appliance that contains a lightweight-Kubernetes-based (k3s) management cluster prebuilt with the OpenNebula Cluster API Provider (CAPONE) and other Cluster API providers, as well as a Rancher instance. This appliance is ready for deploying and managing workload clusters without any previous setup.
- CAPONE: the OpenNebula Cluster API Provider which is a Kubernetes Cluster API infrastructure provider that contacts with the OpenNebula frontend API for provisioning the necessary infrastructure for running workload clusters over OpenNebula VMs.
Infrastructure Provisioning
This setup requires OpenNebula running on a GPU-ready environment, such as an OpenNebula host configured with PCI-Passthrough for exposing the GPU cards to the node workloads.
Kubernetes GPU Workload Cluster Provisioning with CAPONE
Deploying the Management Cluster
First we need to deploy the Kubernetes management cluster. This cluster uses the CAPONE controller for automatically provisioning the required VMs for hosting the Kubernetes GPU workload cluster.
Deploy your Kubernetes management cluster through the CAPI Service in OpenNebula.
Step 1: Deploying the CAPI Appliance
Login as
oneadminuser on the frontend instance and download the CAPI Service from the marketplace:onemarketapp export "Service Capi" service_Capi -d <datastore_id>Where
datastore_idis your image datastore identifier. In this example, use the default one, with ID:1.onemarketapp export "Service Capi" service_Capi -d 1Update the
service_Capitemplate by adding the necessary scheduling requirements for deploying in to the desired host. In this case, we’re enablin thehost-passthroughfeature and adding a NIC card attached to the defaultadmin_netnetwork, but you can change it to any network of your consideration. The chosen network must be the same that we are going to use for the workload cluster vRouter ingress traffic.onetemplate update service_Capi --- NIC=[ NETWORK="admin_net" ] CPU_MODEL=[MODEL="host-passthrough" ]Instantiate the
service_Capiappliance:onetemplate instantiate service_Capi --name capiThe CLI will ask you to input some values. Just press “Enter” for each input in order to keep the default values.
The CAPI appliance takes some minutes to be in “Ready” status. Once the appliance is available, proceed to deploy the workload to the Kubernetes cluster by following the next steps.
Step 2: Deploy the Workload Cluster
Login as
rootuser in the frontend and installhelm 3andkubectltools on the frontend instance:Helm:
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bashkubectl:
curl -fsSL "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" | \ install -o root -g root -m 0755 -D /dev/fd/0 /usr/local/bin/kubectlLogin as
oneadminuser again in the frontend and gather the following information from your setup and export the following environment variables:export ONE_VNET=admin_net export ONE_FRONTEND_IP=$(onevnet show $ONE_VNET -j | jq .VNET.TEMPLATE.GATEWAY | tr -d '"') export CAPI_VM_IP=$(onevm show capi -j | jq '.VM.TEMPLATE.NIC[0].IP' | tr -d '"') export ONEADMIN_PASSWORD=$(cat /var/lib/one/.one/one_auth | awk -F: '{print $2}')where the
$ONE_VNETvariable contains the vnet where the CAPI instance has been deployed and where the workload cluster is going to be deployed (in our case,admin_net),$ONE_FRONTEND_IPis the IP used for accessing the OpenNebula frontend XMLRPC API (in our case we are using theadmin_vnetvirtual network gateway IP, that is bridged with the OpenNebula frontend host), the$CAPI_VM_IPis the CAPI appliance IP address, and the$ONEADMIN_PASSWORDvariable is the defined password for theoneadminuser.Check that the environment variables have been properly set:
echo $ONE_VNET echo $ONE_FRONTEND_IP echo $CAPI_VM_IP echo $ONEADMIN_PASSWORDyou should see an output like this (note that the IP addresses and the password could change depending on the environment):
admin_net 192.168.100.1 192.168.100.50 p455w0rdRetrieve the management cluster kubeconfig connecting to the CAPI appliance through the frontend oneadmin user SSH key:
ssh root@$CAPI_VM_IP cat /etc/rancher/k3s/k3s.yaml | \ sed "s|server: https://127.0.0.1:6443|server: https://$CAPI_VM_IP:6443|"> \ kubeconfig_management.yamlCheck that you can access the management cluster with the following command:
kubectl --kubeconfig ./kubeconfig_management.yaml get nodesYou should see an output similar to this:
NAME STATUS ROLES AGE VERSION capi Ready control-plane,master 9m20s v1.31.4+k3s1Create a
values.yamlfile for parameterizing the workload cluster helm chart. You will see that thevalues.yamlfile contains some default value overrides, like the images and templates applied for the workload nodes:cat > values.yaml << EOF ONE_XMLRPC: "http://$ONE_FRONTEND_IP:2633/RPC2" ONE_AUTH: "oneadmin:$ONEADMIN_PASSWORD" ROUTER_TEMPLATE_NAME: "{{ .Release.Name }}-router" MASTER_TEMPLATE_NAME: "{{ .Release.Name }}-master" WORKER_TEMPLATE_NAME: "{{ .Release.Name }}-worker" CLUSTER_IMAGES: - imageName: "{{ .Release.Name }}-router" imageContent: | PATH = "https://d24fmfybwxpuhu.cloudfront.net/service_VRouter-7.0.0-0-20250528.qcow2" DEV_PREFIX = "vd" - imageName: "{{ .Release.Name }}-node" imageContent: | PATH = "https://d24fmfybwxpuhu.cloudfront.net/ubuntu2404-7.0.0-0-20250528.qcow2" DEV_PREFIX = "vd" CLUSTER_TEMPLATES: - templateName: "{{ .Release.Name }}-router" templateContent: | CONTEXT = [ NETWORK = "YES", ONEAPP_VNF_DNS_ENABLED = "YES", ONEAPP_VNF_DNS_NAMESERVERS = "1.1.1.1,8.8.8.8", ONEAPP_VNF_DNS_USE_ROOTSERVERS = "NO", ONEAPP_VNF_NAT4_ENABLED = "YES", ONEAPP_VNF_NAT4_INTERFACES_OUT = "eth0", ONEAPP_VNF_ROUTER4_ENABLED = "YES", SSH_PUBLIC_KEY = "\$USER[SSH_PUBLIC_KEY]", TOKEN = "YES" ] CPU = "1" CPU_MODEL=[ MODEL="host-passthrough" ] DISK = [ IMAGE = "{{ .Release.Name }}-router" ] GRAPHICS = [ LISTEN = "0.0.0.0", TYPE = "vnc" ] LXD_SECURITY_PRIVILEGED = "true" MEMORY = "512" NIC_DEFAULT = [ MODEL = "virtio" ] SCHED_REQUIREMENTS="HYPERVISOR=kvm" OS = [ ARCH = "x86_64", FIRMWARE_SECURE = "YES" ] VCPU = "2" VROUTER = "YES" - templateName: "{{ .Release.Name }}-master" templateContent: | CONTEXT = [ BACKEND = "YES", NETWORK = "YES", GROW_FS = "/", SET_HOSTNAME = "\$NAME", SSH_PUBLIC_KEY = "\$USER[SSH_PUBLIC_KEY]", TOKEN = "YES" ] CPU = "12" CPU_MODEL=[ MODEL="host-passthrough" ] DISK = [ IMAGE = "{{ .Release.Name }}-node", SIZE = "41920" ] FEATURES=[ ACPI="yes", APIC="yes", PAE="yes" ] GRAPHICS = [ LISTEN = "0.0.0.0", TYPE = "vnc" ] HYPERVISOR = "kvm" LXD_SECURITY_PRIVILEGED = "true" MEMORY = "32768" NIC=[ NETWORK="$ONE_VNET" ] OS = [ ARCH = "x86_64", FIRMWARE="UEFI", MACHINE="pc-q35-noble" ] SCHED_REQUIREMENTS = "HYPERVISOR=kvm" VCPU = "16" - templateName: "{{ .Release.Name }}-worker" templateContent: | CONTEXT = [ BACKEND = "YES", NETWORK = "YES", GROW_FS = "/", SET_HOSTNAME = "\$NAME", SSH_PUBLIC_KEY = "\$USER[SSH_PUBLIC_KEY]", TOKEN = "YES" ] CPU = "12" CPU_MODEL=[ MODEL="host-passthrough" ] DISK = [ IMAGE = "{{ .Release.Name }}-node", SIZE = "204800" ] FEATURES=[ ACPI="yes", APIC="yes", PAE="yes" ] GRAPHICS = [ LISTEN = "0.0.0.0", TYPE = "vnc" ] HYPERVISOR = "kvm" LXD_SECURITY_PRIVILEGED = "true" MEMORY = "32768" NIC=[ NETWORK="$ONE_VNET" ] OS=[ ARCH="x86_64", FIRMWARE="UEFI", MACHINE="pc-q35-noble" ] PCI=[ CLASS="0302", DEVICE="26b9", VENDOR="10de" ] SCHED_REQUIREMENTS="HYPERVISOR=kvm" VCPU = "16" CONTROL_PLANE_MACHINE_COUNT: 1 WORKER_MACHINE_COUNT: 2 PUBLIC_NETWORK_NAME: $ONE_VNET PRIVATE_NETWORK_NAME: $ONE_VNET EOFThe number of GPU devices mounted in each worker node depends on the definition of the
PCIattribute in the worker nodes template (note that the attributes of this map could change depending on the GPU model). In our case, we are deploying 2 worker nodes, with 1 GPU attached to each one. In case you only have available a single GPU card, change the number orWORKER_MACHINE_COUNTto 1. More information on NVIDIA GPU Passthrough section. In this example, we will attach a single NVIDIA GPU card to each worker nodes.Once the
values.yamlfile is available, you can proceed to deploy the workload cluster with Helm. First, add the helm chart repo for CAPONE and apply the helm chart referencing the values file:helm repo add capone https://opennebula.github.io/cluster-api-provider-opennebula/charts/ \ && helm repo updateVerify the available charts in that repo:
helm search repo caponeThe result should output the
capone-kadmandcapone-rke2charts:NAME CHART VERSION APP VERSION DESCRIPTION capone/capone-kadm 0.1.7 v0.1.7 OpenNebula/Kubeadm CAPI Templates capone/capone-rke2 0.1.7 v0.1.7 OpenNebula/RKE2 CAPI TemplatesFinally, proceed to install the workload cluster template in the CAPI appliance from the capone-rke2 chart:
helm install --kubeconfig ./kubeconfig_management.yaml \ k8s-gpu-test capone/capone-rke2 --values ./values.yaml
The CAPONE and rke2 Cluster API controllers in the management cluster will deploy the workload cluster on the scheduled host. Check the logs of those controllers to review the progress:
NAMESPACE NAME READY STATUS
capone-system capone-controller-manager-64db4f6867-49ppm 1/1 Running
rke2-bootstrap-system rke2-bootstrap-controller-manager-676f89558c-85nxg 1/1 Running
rke2-control-plane-system rke2-control-plane-controller-manager-76fb8568cd-5txwc 1/1 Running
Check the virtual machines in order to see if the cluster has been finally deployed:
onevm list
When the deployment finishes you will see the following list of VMs: 1 vRouter instance with control plane Kubernetes API as backend (with the vr- suffix), 1 control plane node, and 2 worker nodes (with the md-0 affix in the name).
ID USER GROUP NAME STAT CPU MEM HOST
448 oneadmin oneadmin k8s-gpu-test-md-0-m5tzv-drg4k runn 16 32G vgpu2
447 oneadmin oneadmin k8s-gpu-test-md-0-m5tzv-btvn4 runn 16 32G vgpu2
446 oneadmin oneadmin k8s-gpu-test-qb8dl runn 16 32G vgpu2
445 oneadmin oneadmin vr-k8s-gpu-test-cp-0 runn 2 512M vgpu2
Connecting to the Workload Cluster Kubernetes API Locally
To establish a local connection to the Workload Cluster Kubernetes API, you will need to export the following environment variables. The $VROUTER_IP will contain the public IP address of the vRouter instance, and the $CONTROL_PLANE_IP will contain the IP address of the workload cluster control plane instance. Note that the virtual machines change on each deploy, so change the name of the vRouter and control plane instance appropriately in the following code block:
export VROUTER_VM_NAME=<changeme>
export CONTROL_PLANE_VM_NAME=<changeme>
export VROUTER_IP=$(onevm show $VROUTER_VM_NAME -j | jq '.VM.TEMPLATE.NIC[0].VROUTER_IP' | tr -d '"')
export CONTROL_PLANE_IP=$(onevm show $CONTROL_PLANE_VM_NAME -j | jq '.VM.TEMPLATE.NIC[0].IP' | tr -d '"')
for instance, corresponding to our previous example:
export VROUTER_VM_NAME=vr-k8s-gpu-test-cp-0
export CONTROL_PLANE_VM_NAME=k8s-gpu-test-qb8dl
export VROUTER_IP=$(onevm show $VROUTER_VM_NAME -j | jq '.VM.TEMPLATE.NIC[0].VROUTER_IP' | tr -d '"')
export CONTROL_PLANE_IP=$(onevm show $CONTROL_PLANE_VM_NAME -j | jq '.VM.TEMPLATE.NIC[0].IP' | tr -d '"')
echo the declared variables for checking they have been properly filled:
echo $VROUTER_IP
echo $CONTROL_PLANE_IP
you should see an output like this (the IP addresses could change depending on the environment):
192.168.100.51
192.168.100.54
To access the Kubernetes API from your localhost, use the kubeconfig file that is located in the /etc/rancher/rke2/rke2.yaml file of the workload cluster control plane node.
ssh -J root@$VROUTER_IP root@$CONTROL_PLANE_IP cat /etc/rancher/rke2/rke2.yaml | \
sed "s|server: https://127.0.0.1:6443|server: https://$VROUTER_IP:6443|"> kubeconfig_workload.yaml
Connect to the workload cluster through kubectl, checking for instance the workload cluster nodes:
kubectl --kubeconfig=kubeconfig_workload.yaml get nodes
This command should output the workload cluster nodes:
NAME STATUS ROLES AGE VERSION
k8s-gpu-test-md-0-m5tzv-btvn4 Ready <none> 8d v1.31.4+rke2r1
k8s-gpu-test-md-0-m5tzv-drg4k Ready <none> 8d v1.31.4+rke2r1
k8s-gpu-test-qb8dl Ready control-plane,etcd,master 8d v1.31.4+rke2r1
If you want to avoid setting the --kubeconfig flag each time you use kubectl, export an env variable referencing the kubeconfig file. This will make kubectl refer to that kubeconfig by default on the shell that you are using:
export KUBECONFIG="$PWD/kubeconfig_workload.yaml"
NVIDIA GPU Operator Installation
The NVIDIA GPU Operator executes the necessary steps for allowing NVIDIA GPU workloads in Kubernetes using the device plugin framework. The managed components includes:
- NVIDIA CUDA drivers
- Kubernetes device plugin for GPUs
- NVIDIA Container runtime
- DCGM based monitoring
- Others
The procedure to install the NVIDIA GPU Operator is as follows:
Add the NVIDIA helm repo:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo updateExport the
KUBECONFIGvariable.export KUBECONFIG="$PWD/kubeconfig_workload.yaml"Deploy the gpu-operator on the workload cluster:
# create gpu-operator namespace kubectl create ns gpu-operator # enforce privileged pods on that namespace kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged # install the helm chart helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator \ --version=v25.3.2 \ --set "toolkit.env[0].name=CONTAINERD_CONFIG" \ --set "toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml" \ --set "toolkit.env[1].name=CONTAINERD_SOCKET" \ --set "toolkit.env[1].value=/run/k3s/containerd/containerd.sock" \ --set "toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS" \ --set "toolkit.env[2].value=nvidia" \ --set "toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT" \ --set-string "toolkit.env[3].value=true"Note that you have specified the
CONTAINERD_CONFIGandCONTAINERD_SOCKETpaths for the rke2 Kubernetes distribution, as it’s using a different one than the default.After a successful Helm installation, you will see this output:
NAME: gpu-operator-1753949578 LAST DEPLOYED: Thu Jul 31 10:13:01 2025 NAMESPACE: gpu-operator STATUS: deployed REVISION: 1 TEST SUITE: NoneCheck that the deployed pods for the gpu-operator are up and running:
kubectl get pods -n gpu-operatorYou should see the following pods running on the
gpu-operatornamespace:NAME READY STATUS RESTARTS AGE gpu-feature-discovery-f9fs6 1/1 Running 0 8d gpu-feature-discovery-q7h26 1/1 Running 0 8d gpu-operator-1753950388-node-feature-discovery-gc-854b7fb8jr989 1/1 Running 0 8d gpu-operator-1753950388-node-feature-discovery-master-955d78rh4 1/1 Running 0 8d gpu-operator-1753950388-node-feature-discovery-worker-4tvlv 1/1 Running 0 8d gpu-operator-1753950388-node-feature-discovery-worker-d7sxz 1/1 Running 0 8d gpu-operator-1753950388-node-feature-discovery-worker-zb5gj 1/1 Running 0 8d gpu-operator-84f4699cc4-kdcjn 1/1 Running 0 8d nvidia-container-toolkit-daemonset-d5fwk 1/1 Running 0 8d nvidia-container-toolkit-daemonset-k54w7 1/1 Running 0 8d nvidia-cuda-validator-gmnrq 0/1 Completed 0 8d nvidia-cuda-validator-vl94m 0/1 Completed 0 8d nvidia-dcgm-exporter-g5j2n 1/1 Running 0 8d nvidia-dcgm-exporter-g6x99 1/1 Running 0 8d nvidia-device-plugin-daemonset-rhj8z 1/1 Running 0 8d nvidia-device-plugin-daemonset-v5bcq 1/1 Running 0 8d nvidia-operator-validator-ctg6s 1/1 Running 0 8d nvidia-operator-validator-z52q7 1/1 Running 0 8dCheck if the Kubernetes nodes gpu autodiscovery operates as expected, by checking the workload nodes, for instance the
nvidia.com/*labels and the allocatable gpu capacity:kubectl describe node <kubernetes_workload_node_name>for instance, in the example above:
kubectl describe node k8s-gpu-test-md-0-m5tzv-drg4kThose are the labels you should see
[...] nvidia.com/gpu.memory=46068 nvidia.com/gpu.mode=compute nvidia.com/gpu.present=true nvidia.com/gpu.product=NVIDIA-L40S nvidia.com/gpu.replicas=1 [...] Capacity: cpu: 16 ephemeral-storage: 202051056Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32855784Ki nvidia.com/gpu: 1 pods: 110 Allocatable: cpu: 16 ephemeral-storage: 196555267123 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32855784Ki nvidia.com/gpu: 1 pods: 110Finally, to use the PCI GPUs on the specific pod, add the
spec.runtimeClassName:nvidiaparameter in the pod/deploy manifest and setnvidia.com/gpu:1as a requested resource.
Tip
After provisioning your AI Factory with AI-Ready Kubernetes, you may continue with additional validation procedures built on top of K8s, such as Deployment of NVIDIA Dynamo and Deployment of NVIDIA KAI Scheduler.We value your feedback
Was this information helpful?
Glad to hear it
Sorry to hear that