NVIDIA GPU Passthrough
Here you will find detailed instructions for configuring PCI passthrough for high-performance NVIDIA® GPUs on x86_64 hypervisors. The procedures described here have been validated with NVIDIA H100 GPUs but can be adapted for other similar high-performance NVIDIA GPUs. This allows Virtual Machines to get exclusive access to the GPU, which is recommended for AI/ML workloads. It builds upon the concepts explained in the general PCI Passthrough guide. This guide is not applicable to ARM-based systems.
Note
As an alternative to PCI passthrough, OpenNebula also supports NVIDIA vGPU, which allows a single physical GPU to be partitioned and shared among multiple VMs. While PCI passthrough is the recommended approach for demanding AI/ML workloads, especially model training, vGPU can be a suitable option for less intensive tasks like model inference or development environments. For more information, see the NVIDIA vGPU guide.Hypervisor Configuration
The following configurations must be performed on the KVM hypervisors that will host the GPUs.
Enabling IOMMU
The IOMMU (Input-Output Memory Management Unit) is a hardware feature that allows the hypervisor to isolate I/O devices and is a prerequisite for PCI passthrough. To enable it, you need to add kernel parameters to the hypervisor’s boot configuration.
- Edit the GRUB configuration file at
/etc/default/gruband add the appropriate parameter to theGRUB_CMDLINE_LINUX_DEFAULTline:
- For Intel CPUs:
intel_iommu=on iommu=pt - For AMD CPUs:
amd_iommu=on iommu=pt
Example for an Intel-based hypervisor:
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt"
- After saving the file, update the GRUB configuration and reboot the hypervisor:
# update-grub
# reboot
- After rebooting, verify that IOMMU is enabled by checking the kernel messages:
# dmesg | grep -i iommu
Alternatively, to confirm that IOMMU is active proceed to check that the /sys/kernel/iommu_groups/ directory exists and is populated with subdirectories. An empty directory might indicate that IOMMU is not correctly enabled in the kernel or BIOS.
VFIO Driver Binding
Once IOMMU is enabled, the GPU must be unbound from the default host driver (e.g., nouveau) and bound to the vfio-pci driver, which is designed for PCI passthrough.
Install
driverctlutility:# apt install driverctlEnsure
vfio-pcimodule is loaded on boot:# echo "vfio-pci" | sudo tee /etc/modules-load.d/vfio-pci.conf # modprobe vfio-pciIdentify the GPU’s PCI address:
# lspci -D | grep -i nvidia 0000:e1:00.0 3D controller: NVIDIA Corporation GH100 [H100 PCIe] (rev a1)Set the driver override: Use the PCI address from the previous step to set an override for the device to use the
vfio-pcidriver.# driverctl set-override 0000:e1:00.0 vfio-pciVerify the driver binding: Check that the GPU is now using the
vfio-pcidriver.# lspci -Dnns e1:00.0 -k Kernel driver in use: vfio-pci
VFIO Device Ownership
For OpenNebula to manage the GPU, the VFIO device files in /dev/vfio/ must be owned by the root:kvm user and group. This is achieved by creating a udev rule.
Identify the IOMMU group for your GPU using its PCI address:
# find /sys/kernel/iommu_groups/ -type l | grep e1:00.0 /sys/kernel/iommu_groups/85/devices/0000:e1:00.0In this example, the IOMMU group is
85.Create a
udevrule: Create the file/etc/udev/rules.d/99-vfio.ruleswith the following content:SUBSYSTEM=="vfio", GROUP="kvm", MODE="0666"Reload
udevrules:# udevadm control --reload # udevadm triggerVerify ownership: Check the ownership of the device file corresponding to your GPU’s IOMMU group.
# ls -la /dev/vfio/ crw-rw-rw- 1 root kvm 509, 0 Oct 16 10:00 85
OpenNebula Configuration
Monitoring PCI Devices
To make the GPUs available in OpenNebula, configure the PCI probe on the front-end node to monitor NVIDIA devices.
Edit the PCI probe configuration file at
/var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf.Add a filter for NVIDIA devices:
:filter: '10de:*'Synchronize the hosts from the Front-end to apply the new configuration:
# su - oneadmin $ onehost sync -f
After a few moments, you can check if the GPU is being monitored correctly by showing the host information (onehost show <HOST_ID>). The GPU should appear in the PCI DEVICES section.
By default, the host system monitoring probe may take up to 10 minutes to detect new PCI devices. To run the host monitoring probe immediately, you can force it with:
$ onehost forceupdate <HOST_ID>
VM Template Best Practices for AI Workloads
To achieve proper GPU operation inside the VM, including optimal performance for AI/ML workloads, configure the VM Template with the following settings:
Firmware (UEFI)
Modern high-performance GPUs like the H100 benefit from Resizable BAR, which requires UEFI booting. OpenNebula will automatically select the appropriate UEFI firmware file. If you need to use a specific OVMF file, you can provide its full path in the FIRMWARE attribute instead of \"UEFI\".
- Attribute:
FIRMWARE - Section:
OS - Value:
"UEFI"
OS = [
ARCH = "x86_64",
FIRMWARE = "UEFI"
]
Machine Type (q35)
The q35 machine type provides a modern PCIe-based chipset, improving compatibility and performance for PCIe devices like GPUs. Additionally, if the q35 machine type is used, OpenNebula replicates the hypervisor PCI topology into the guest VM. This gives AI frameworks visibility into the PCI layout, allowing them to optimize data transfers and reduce latency.
- Attribute:
MACHINE - Section:
OS - Value:
"q35"
OS = [
...,
MACHINE = "q35"
]
CPU Model (Host Passthrough)
Exposing the host CPU exact model and feature set to the VM results in better hardware support and performance. Having the full list of CPU features is a hard requirement for modern high-performance GPUs like the NVIDIA H100, because its drivers require access to advanced CPU instruction sets to initialize and function correctly.
- Attribute:
CPU_MODEL - Value:
MODEL="host-passthrough"
CPU_MODEL = [
MODEL = "host-passthrough"
]
NUMA and CPU Pinning
CPU pinning is crucial for performance as it dedicates physical CPU cores to the VM, preventing them from being shared with other VMs. When a PIN_POLICY is defined, the OpenNebula scheduler automatically places the VM on a NUMA node that is physically close to the requested GPU and pins the vCPUs to the cores of that node, minimizing latency. For a more in-depth explanation of this topic, refer to the Virtual Topology and CPU Pinning guide.
For VM pinning to work, you must also enable the pinning policy on the Host. You can do this by editing the Host’s template and setting PIN_POLICY="PINNED", or by selecting the PINNED policy in the Host’s NUMA tab in Sunstone.
- Attribute:
TOPOLOGY - Values:
PIN_POLICY,CORES,SOCKETS
TOPOLOGY = [
PIN_POLICY = "CORE",
CORES = "16",
SOCKETS = "1"
]
PCI Device Passthrough
Finally, assign the GPU to the VM using the PCI attribute. You can select a specific GPU by its address. Once the VM is running, verify that the guest OS recognizes the device by executing the lspci command inside the VM.
- Attribute:
PCI - Value:
SHORT_ADDRESS="<pci_address>"
PCI = [
SHORT_ADDRESS = "e1:00.0"
]
Sample VM Template
Here is a sample VM Template that incorporates all the recommended attributes for an AI workload VM with an NVIDIA H100 GPU. This template is a reference. To meet your specific requirements, specify MEMORY, VCPU, IMAGE_ID, and other parameters.
CONTEXT=[
NETWORK="YES",
SSH_PUBLIC_KEY="$USER[SSH_PUBLIC_KEY]" ]
CPU="1"
CPU_MODEL=[
MODEL="host-passthrough" ]
DISK=[
IMAGE_ID="163" ]
GRAPHICS=[
LISTEN="0.0.0.0",
TYPE="vnc" ]
HYPERVISOR="kvm"
LOGO="images/logos/ubuntu.png"
LXD_SECURITY_PRIVILEGED="true"
MEMORY="768"
OS=[
ARCH="x86_64",
FIRMWARE="UEFI",
MACHINE="q35" ]
PCI=[
SHORT_ADDRESS="e1:00.0" ]
SCHED_REQUIREMENTS="HYPERVISOR=kvm"
TOPOLOGY=[
CORES="4",
PIN_POLICY="CORE",
SOCKETS="1",
THREADS="1" ]
VCPU="4"
We value your feedback
Was this information helpful?
Glad to hear it
Sorry to hear that