NVIDIA vGPU support

This section describes how to configure the hypervisor in order to use NVIDIA vGPU features.

BIOS

You need to check that the following settings are enabled in your BIOS configuration:

  • Enable SR-IOV

  • Enable IOMMU

Note that the specific menu options where you need to activate these features depends on the motherboard manufacturer.

NVIDIA Drivers

The NVIDIA drivers are proprietary, so you will probably need to download them separately. Please check the documentation for your Linux distribution. Once you have installed and rebooted your server you should be able to access the GPU information as follows:

lsmod | grep vfio
nvidia_vgpu_vfio       57344  0
nvidia-smi
Wed Feb  9 12:36:07 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:41:00.0 Off |                    0 |
|  0%   52C    P8    26W / 150W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Enable the NVIDIA vGPU

Warning

The following steps assume that your graphic card supports SR-IOV, if not, please refer to official NVIDIA documentation in order to activate vGPU.

Finding the PCI

lspci | grep NVIDIA
41:00.0 3D controller: NVIDIA Corporation Device 2236 (rev a1)

In this example the address is 41:00.0. Then, we need to convert this to transformed-bdf format by replacing the colon and period with underscores, in our case: 41_00_0. Now, we can obtain the PCI name, and the full information about the NVIDIA GPU (e.g. max number of virtual functions):

virsh nodedev-list --cap pci | grep 41_00_0
pci_0000_41_00_0
virsh nodedev-dumpxml pci_0000_41_00_0
<device>
    <name>pci_0000_41_00_0</name>
    <path>/sys/devices/pci0000:40/0000:40:03.1/0000:41:00.0</path>
    <parent>pci_0000_40_03_1</parent>
    <driver>
        <name>nvidia</name>
    </driver>
    <capability type='pci'>
        <class>0x030200</class>
        <domain>0</domain>
        <bus>65</bus>
        <slot>0</slot>
        <function>0</function>
        <product id='0x2236'/>
        <vendor id='0x10de'>NVIDIA Corporation</vendor>
        <capability type='virt_functions' maxCount='32'/>
        <iommuGroup number='44'>
        <address domain='0x0000' bus='0x40' slot='0x03' function='0x1'/>
        <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/>
        <address domain='0x0000' bus='0x40' slot='0x03' function='0x0'/>
        </iommuGroup>
        <pci-express>
        <link validity='cap' port='0' speed='16' width='16'/>
        <link validity='sta' speed='2.5' width='16'/>
        </pci-express>
    </capability>
</device>

Enabling Virtual Functions

Important

You may need to perform this operation every time you reboot your server.

# /usr/lib/nvidia/sriov-manage -e slot:bus:domain.function
/usr/lib/nvidia/sriov-manage -e 00:41:0000.0
Enabling VFs on 00:41:0000.0

If you get an error while doing this operation, please double check that all the BIOS steps have been correctly performed. If everything goes well, you should get something similar to this:

ls -l /sys/bus/pci/devices/0000:41:00.0/ | grep virtfn
lrwxrwxrwx 1 root root           0 Feb  9 10:37 virtfn0 -> ../0000:41:00.4
lrwxrwxrwx 1 root root           0 Feb  9 10:37 virtfn1 -> ../0000:41:00.5
lrwxrwxrwx 1 root root           0 Feb  9 10:37 virtfn10 -> ../0000:41:01.6
...
lrwxrwxrwx 1 root root           0 Feb  9 10:37 virtfn30 -> ../0000:41:04.2
lrwxrwxrwx 1 root root           0 Feb  9 10:37 virtfn31 -> ../0000:41:04.3

Configuring QEMU

Finally, add the following udev rule:

echo 'SUBSYSTEM=="vfio", GROUP="kvm", MODE="0666"' > /etc/udev/rules.d/opennebula-vfio.rules

# Reload udev rules:
udevadm control --reload-rules && udevadm trigger

Note

You can check full NVIDIA documentation here.

Using the vGPU

Once everything is set up, you can follow these steps.