KVM Driver

Requirements

The Hosts will need a CPU with Intel VT or AMD’s AMD-V features in order to support virtualization. KVM’s Preparing to use KVM guide will clarify any doubts you may have regarding whether your hardware supports KVM.

KVM will be installed and configured after following the KVM Host Installation section.

Considerations & Limitations

Try to use virtio whenever possible, both for networks and disks. Using emulated hardware, both for networks and disks, will have an impact on performance and will not expose all the available functionality. For instance, if you don’t use virtio for the disk drivers, you will not be able to exceed a small number of devices connected to the controller, meaning that you have a limit when attaching disks and it will not work while the VM is running (live disk-attach).

When updating the VM configuration live using one.vm.updateconf although the all of the VM configuration will get updated on the VM instance template, only the CONTEXT and BACKUP_CONFIG will take effect immediately. The rest of the configuration will not take effect until the next VM reboot because it changes the VM virtual hardware.

The full list of configuration attributes are:

OS        = ["ARCH", "MACHINE", "KERNEL", "INITRD", "BOOTLOADER", "BOOT", "KERNEL_CMD", "ROOT", "SD_DISK_BUS", "UUID", "FIRMWARE"]
FEATURES  = ["ACPI", "PAE", "APIC", "LOCALTIME", "HYPERV", "GUEST_AGENT", "VIRTIO_SCSI_QUEUES", "VIRTIO_BLK_QUEUES", "IOTHREADS"]
INPUT     = ["TYPE", "BUS"]
GRAPHICS  = ["TYPE", "LISTEN", "PASSWD", "KEYMAP", "COMMAND" ]
VIDEO     = ["TYPE", "IOMMU", "ATS", "VRAM", "RESOLUTION"]
RAW       = ["DATA", "DATA_VMX", "TYPE", "VALIDATE"]
CPU_MODEL = ["MODEL"]
BACKUP_CONFIG = ["FS_FREEZE", "KEEP_LAST", "BACKUP_VOLATILE", "MODE", "INCREMENT_MODE"]
CONTEXT (any value, except ETH*, **variable substitution will be made**)

Configuration

KVM Configuration

The OpenNebula packages will configure KVM automatically, therefore you don’t need to take any extra steps.

OpenNebula

The KVM driver is enabled by default in OpenNebula /etc/one/oned.conf on your Front-end host with reasonable defaults. Read the oned Configuration to understand these configuration parameters and Virtual Machine Drivers Reference to know how to customize and extend the drivers.

Driver Defaults

There are some attributes required for KVM to boot a VM. You can set a suitable default for them so all the VMs get the required values. These attributes are set in /etc/one/vmm_exec/vmm_exec_kvm.conf. The following can be set for KVM:

  • EMULATOR: path to the kvm executable.

  • OS: attributes KERNEL, INITRD, BOOT, ROOT, KERNEL_CMD, MACHINE, ARCH and SD_DISK_BUS.

  • VCPU

  • FEATURES: attributes ACPI, PAE, APIC, HEPRV, GUEST_AGENT, VIRTIO_SCSI_QUEUES, VIRTIO_BLK_QUEUES, IOTHREADS.

  • CPU_MODEL: attribute MODEL.

  • DISK: attributes DRIVER, CACHE, IO, DISCARD, TOTAL_BYTES_SEC, TOTAL_IOPS_SEC, READ_BYTES_SEC, WRITE_BYTES_SEC, READ_IOPS_SEC, WRITE_IOPS_SEC, SIZE_IOPS_SEC.

  • NIC: attribute FILTER, MODEL.

  • GRAPHICS: attributes TYPE, LISTEN, PASSWD, KEYMAP, RANDOM_PASSWD. The VM instance must have at least empty GRAPHICS = [] section to read these default attributes from the config file and to generate cluster unique PORT attribute.

  • VIDEO: attributes: TYPE, IOMMU, ATS, VRAM, RESOLUTION.

  • RAW: to add libvirt attributes to the domain XML file.

  • HYPERV_OPTIONS: to enable hyperv extensions.

  • SPICE_OPTIONS: to add default devices for SPICE.

  • OVMF_UEFIS: to add allowed file paths for Open Virtual Machine Firmware.

  • Q35_ROOT_PORTS: to modify the number of PCI devices that can be attached in q35 VMs (defaults to 16)

Warning

These values are only used during VM creation; for other actions like nic or disk attach/detach the default values must be set in /var/lib/one/remotes/etc/vmm/kvm/kvmrc. For more info check Files and Parameters section.

For example (check the actual state in the configuration file on your Front-end):

OS       = [ ARCH = "x86_64" ]
FEATURES = [ PAE = "no", ACPI = "yes", APIC = "no", HYPERV = "no", GUEST_AGENT = "no", VIRTIO_SCSI_QUEUES="auto" ]
DISK     = [ DRIVER = "raw" , CACHE = "none"]
HYPERV_OPTIONS="<relaxed state='on'/><vapic state='on'/><spinlocks state='on' retries='4096'/>"
SPICE_OPTIONS="
    <video>
        <model type='vga' heads='1'/>
    </video>
         <sound model='ich6' />
    <channel type='spicevmc'>
        <target type='virtio' name='com.redhat.spice.0'/>
    </channel>
    <redirdev bus='usb' type='spicevmc'/>
    <redirdev bus='usb' type='spicevmc'/>
    <redirdev bus='usb' type='spicevmc'/>"

Note

These values can be overriden in the Cluster, Host and VM Template

Since OpenNebula 6.0 you should no longer need to modify the EMULATOR variable to point to the kvm exectuable; instead, EMULATOR now points to the symlink /usr/bin/qemu-kvm-one which should link the correct KVM binary for the given OS on a Host.

Live-Migration for Other Cache settings

If you are using disks with a cache setting different to none you may have problems with live migration depending on the libvirt version. You can enable the migration adding the --unsafe parameter to the virsh command. The file to change is /var/lib/one/remotes/etc/vmm/kvm/kvmrc. Uncomment the following line, and execute onehost sync --force afterwards:

MIGRATE_OPTIONS=--unsafe

Configure the Timeouts (Optional)

Optionally, you can set a timeout for the VM Shutdown operation. This feature is useful when a VM gets stuck in Shutdown (or simply does not notice the shutdown command). By default, after the timeout time the VM will return to Running state but is can also be configured so the VM is destroyed after the grace time. This is configured in /var/lib/one/etc/remotes/vmm/kvm/kvmrc:

# Seconds to wait after shutdown until timeout
export SHUTDOWN_TIMEOUT=180

# Uncomment this line to force VM cancellation after shutdown timeout
export FORCE_DESTROY=yes

Working with cgroups (Optional)

Optionally, you can set-up cgroups to control resources on your Hosts. By default KVM VMs will be placed in the machine.slice, the resources assigned in this slice can be adjusted for each hypervisor. The libvirt cgroups documentation describes all the cases and the way the cgroups are managed by libvirt/KVM.

OpenNebula will compute the shares attribute of the Libvirt domain using the CPU parameter and the base share value, which depends on the cgroups version of the hypervisor. For example, a VM with CPU=2 will get a cgroup value of cpu.shares = 2048 (or cpu.weight=200 for cgroups version 2), twice the default value. Note that if you have a mix of cgroups version 1 and 2 hosts you may have inconsistent resource distribution if you life-migrate a VM across different versions.

Memory Cleanup (Optional)

Memory allocated by caches or memory fragmentation may cause the VM to fail to deploy, even if there is enough memory on the Host at first sight. To avoid such failures and provide the best memory placement for the VMs, it’s possible to trigger memory cleanup and compactation before the VM starts and/or after the VM stops (by default enabled only on stop). The feature is configured in /var/lib/one/etc/remotes/vmm/kvm/kvmrc on the Front-end:

# Compact memory before running the VM
#CLEANUP_MEMORY_ON_START=yes

# Compact memory after VM stops
CLEANUP_MEMORY_ON_STOP=yes

Covered VM actions - deploy, migrate, poweroff, recover, release, resize, save, resume, save, suspend and shutdown.

Usage

KVM Specific Attributes

The following are template attributes specific to KVM. Please refer to the template reference documentation for a complete list of the attributes supported to define a VM.

DISK

  • TYPE: This attribute defines the type of media to be exposed to the VM; possible values are: disk (default) or cdrom. This attribute corresponds to the media option of the -driver argument of the kvm command.

  • DRIVER: specifies the format of the disk image; possible values are raw, qcow2… This attribute corresponds to the format option of the -driver argument of the kvm command.

  • CACHE: specifies the optional cache mechanism; possible values are default, none, writethrough and writeback.

  • IO: sets IO policy; possible values are threads and native.

  • IOTHREAD: thread id used by this disk. It can only be used for virtio disk conrtollers and if IOTHREADS > 0.

  • DISCARD: controls what to do with trim commands; the options are ignore or unmap. It can only be used with virtio-scsi.

  • IO Throttling support - You can limit TOTAL/READ/WRITE throughput or IOPS. Also, burst control for these IO operations can be set for each disk. See the reference guide for the attributed names and purpose.

NIC

  • TARGET: name for the tun device created for the VM. It corresponds to the ifname option of the ‘-net’ argument of the kvm command.

  • SCRIPT: name of a shell script to be executed after creating the tun device for the VM. It corresponds to the script option of the ‘-net’ argument of the kvm command.

  • QoS to control the network traffic. We can define different kinds of controls over network traffic:

    • INBOUND_AVG_BW

    • INBOUND_PEAK_BW

    • INBOUND_PEAK_KW

    • OUTBOUND_AVG_BW

    • OUTBOUND_PEAK_BW

    • OUTBOUND_PEAK_KW

  • MODEL: ethernet hardware to emulate. You can get the list of available models with this command:

kvm -net nic,model=? -nographic /dev/null
  • FILTER to define a network filtering rule for the interface. Libvirt includes some predefined rules (e.g. clean-traffic) that can be used. Check the Libvirt documentation for more information; you can also list the rules in your system with:

virsh -c qemu:///system nwfilter-list
  • VIRTIO_QUEUES to define how many queues will be used for the communication between CPUs and Network drivers. This attribute is only available with MODEL="virtio".

Graphics

If properly configured, libvirt and KVM can work with SPICE (check here for more information). To select it, just add the following to the GRAPHICS attribute:

  • TYPE = SPICE

Enabling spice will also make the driver inject a specific configuration for these machines. The configuration can be changed in the driver configuration file, variable SPICE_OPTIONS.

Video

If configured, libvirt will attach a video device to the virtual machine with the specified attributes. Available attributes are:

  • TYPE: Defines the device type. Can be none, vga, cirrus, and virtio. Utilizing virtio is required for IOMMU an ATS options.

  • IOMMU: Enables the device to use emulated IOMMU. Requires virtio type.

  • ATS: Enables the device to use Address Translation Service. Requires virtio type.

  • VRAM: Defines the amount of VRAM to allocate to the video device, in kB.

  • RESOLUTION: Defines the preferred resolution of the video device. Should be two numbers separated by an x. Example: 1920x1080

Virtio

Virtio is the framework for IO virtualization in KVM. You will need a Linux kernel with the virtio drivers for the guest. Check the KVM documentation for more info.

If you want to use the virtio drivers add the following attributes to your devices:

  • DISK, add the attribute DEV_PREFIX="vd"

  • NIC, add the attribute MODEL="virtio"

For disks you can also use SCSI bus (sd) and it will use the virtio-scsi controller. This controller also offers high speed as it is not emulating real hardware but also adds support to trim commands to free disk space when the disk has the attribute DISCARD="unmap". If needed, you can change the number of vCPU queues this way:

FEATURES = [
    VIRTIO_SCSI_QUEUES = "auto"
]

Furthermore, you have the option to activate multi-queue support within the virtio-blk driver, enabling simultaneous management of distinct queues by various vCPUs. The auto keyword automatically set the number of queues to the number of vCPUs. When fine-tuning this configuration you may need to consider the queue depth of the underlying hardware. Additionally, this feature can also be configured by DISK:

FEATURES = [
    VIRTIO_BLK_QUEUES = "auto"
]

Firmware

The OS/FIRMWARE attribute can be defined to load a specific firmare interface for virtual machines. The allowed values are:

  • BIOS: use Basic Input/Output System (BIOS).

  • <UEFI_PATH>: one the valid paths to a Unified Extensible Firmware Interface (UEFI) blob defined in OVMF_UEFIS (See Driver Defaults).

The OS/FIRMWARE_SECURE attribute can be used to configure Secure Boot. If this attribute is not defined, no Secure Boot is used by default. The allowed values are:

  • true: use Secure Boot.

  • false: do not use Secure Boot.

Warning

If Secure Boot is enabled, the attribute OS/MACHINE must be set to q35.

Additional Attributes

The RAW attribute allows the end-users to pass custom libvirt/KVM attributes not yet supported by OpenNebula. Basically, everything placed here will be written literally into the KVM deployment file (use libvirt xml format and semantics). You can selectively disable validation of the RAW data by adding VALIDATE="no" to the RAW section. By default, the data will be checked against the libvirt schema.

RAW = [
  TYPE = "kvm",
  VALIDATE = "yes",
  DATA = "<devices><serial type=\"pty\"><source path=\"/dev/pts/5\"/><target port=\"0\"/></serial><console type=\"pty\" tty=\"/dev/pts/5\"><source path=\"/dev/pts/5\"/><target port=\"0\"/></console></devices>" ]

Libvirt Metadata

The following OpenNebula information is added to the metadata section of the Libvirt domain. The specific attributes are listed below:

  • system_datastore

  • name

  • uname

  • uid

  • gname

  • gid

  • opennebula_version

  • stime

  • deployment_time

They correspond to their OpenNebula equivalents for the XML representation of the VM. opennebula_version and deployment_time are the OpenNebula version used during the deployment and deployment time at epoch format, respectively.

Also the VM name is included at libvirt XML title field, so if the --title option is used for listing the libvirt domains the VM name will be shown with the domain name.

Live Resize VCPU and Memory

If you need to resize the capacity of the VM in RUNNING state, you have to set-up some extra attributes to the VM template. These attributes must be set before the VM is started.

Attribute

Description

Mandatory

VCPU_MAX

Maximum number of VCPUs which can be hotplugged.

NO

MEMORY_RESIZE_MODE

HOTPLUG - default. Internally this use virsh attach-device to add more memory. To remove memory you have to remove the exact amount which was previously added. Prefer offline removing.

NO

BALLOONING - Internally this use virsh setmem to add more memory. The new memory size is only recomendation for the VM, the actual memory usage may be different. The target VM displays MEMORY_MAX as available memory.

MEMORY_MAX

Maximum memory allocated for the VM.

NO

MEMORY_SLOTS

Valid only for HOTPLUG. How many memory slots can be used to add add memory. It implies how many times the memory can be added.

NO

Note

Live Memory resize needs QEMU version 2.4. Live VCPU resize needs QEMU version 2.7.

MEMORY_RESIZE_MODE

BALLOONING is about dynamically adjusting the amount of RAM allocated to VMs. It allows KVM to reclaim unused memory from one VM and allocate it to another VM that needs it more, without shutting down or pausing the VMs. The parameter sets up a balloon driver within the VM that communicates with the host. When the host needs to reclaim memory, the driver inflates, reserving some of the VM’s unused memory for the host. When the VM needs additional memory, the driver deflates, releasing reserved memory back to the VM.

From the VM’s standpoint, it seems like the available memory is decreasing or increasing. The OS inside the VM will think it’s using more memory when the balloon inflates and think it’s using less when the balloon deflates. This can go back and forth many times during the VM’s lifecycle, always ensuring that each VM has as much memory as it needs, up to MEMORY_MAX, but no more than that.

In HOTPLUG mode the Guest OS will perceive a new virtual RAM stick being plugged into the virtual motherboard. The downside of this mode is that in order to reduce memory, you need to remove the exact memory it was added before, which emulates the RAM stick removal. By default is limited to 16 RAM stick devices (ie, you can increase memory by hotplug 16 times).

Disk/NIC Hotplugging

KVM supports hotplugging to the virtio and the SCSI buses. For disks, the bus the disk will be attached to is inferred from the DEV_PREFIX attribute of the disk template.

  • vd: virtio

  • sd: SCSI (default)

  • hd: IDE

Note

Hotplugging is not supported for CD-ROM and floppy.

If TARGET is passed instead of DEV_PREFIX the same rules apply (what happens behind the scenes is that OpenNebula generates a TARGET based on the DEV_PREFIX if no TARGET is provided).

The defaults for the newly attached disks and NICs are in /var/lib/one/remotes/etc/vmm/kvm/kvmrc. The relevant parameters are prefixed with DEFAULT_ATTACH_ and explained in the Files and Parameters below.

For Disks and NICs, if the guest OS is a Linux flavor, the guest needs to be explicitly told to rescan the PCI bus. This can be done by issuing the following command as root:

echo 1 > /sys/bus/pci/rescan

Enabling QEMU Guest Agent

QEMU Guest Agent allows the communication of some actions with the guest OS. This agent uses a virtio serial connection to send and receive commands. One of the interesting actions is that it allows you to freeze the filesystem before doing an snapshot. This way the snapshot won’t contain half written data. Filesystem freeze will only be used with CEPH and qcow2 storage drivers.

The agent package needed in the Guest OS is available in most distributions. It’s called qemu-guest-agent in most of them. If you need more information you can follow these links:

The communication channel with guest agent is enabled in the domain XML when the GUEST_AGENT feature is selected in the VM Template.

Importing VMs

VMs running on KVM hypervisors that were not launched through OpenNebula can be imported in OpenNebula. It is important to highlight that, besides the limitations explained in the Host guide, the “Poweroff” operation is not available for these imported VMs in KVM.

Tuning & Extending

Multiple Actions per Host

By default the VMM driver is configured to allow more than one action to be executed per Host. Make sure the parameter -p is added to the driver executable. This is done in /etc/one/oned.conf in the VM_MAD configuration section:

VM_MAD = [
    NAME       = "kvm",
    EXECUTABLE = "one_vmm_exec",
    ARGUMENTS  = "-t 15 -r 0 kvm -p",
    DEFAULT    = "vmm_exec/vmm_exec_kvm.conf",
    TYPE       = "kvm" ]

Restart the main OpenNebula service if changes were made to the mentioned file:

sudo systemctl restart opennebula

The scheduler configuration should be changed to let it deploy more than one VM per Host. The file is located at /etc/one/sched.conf and the value to change is MAX_HOST For example, to let the scheduler submit 10 VMs per Host use this line:

MAX_HOST = 10

Restart the scheduler service for this change to take effect:

sudo systemctl restart opennebula-scheduler

Files and Parameters

The driver consists of the following files:

  • /usr/lib/one/mads/one_vmm_exec : generic VMM driver.

  • /var/lib/one/remotes/vmm/kvm : commands executed to perform actions.

And the following driver configuration files:

  • /etc/one/vmm_exec/vmm_exec_kvm.conf : This file contains default values for KVM domain definitions (in other words, OpenNebula templates). It is generally a good idea to configure here defaults for the KVM-specific attributes, that is, attributes mandatory in the KVM driver that are not mandatory for other hypervisors. Non-mandatory attributes for KVM but specific to them are also recommended to have a default. Changes to this file require opennebula to be restarted.

  • /var/lib/one/remotes/etc/vmm/kvm/kvmrc : This file holds instructions to be executed before the actual driver load to perform specific tasks or to pass environmental variables to the driver. The syntax used for the former is plain shell script that will be evaluated before the driver execution. For the latter, the syntax is the familiar:

ENVIRONMENT_VARIABLE=VALUE

The parameters that can be changed here are as follows:

Parameter

Description

LIBVIRT_URI

Connection string to libvirtd

QEMU_PROTOCOL

Protocol used for live migrations

SHUTDOWN_TIMEOUT

Seconds to wait after shutdown until timeout

VIRSH_RETRIES

Number of “virsh” command retries when required. Currently used in detach-interface and restore.

SYNC_TIME

Trigger VM time synchronization from RTC on resume and after migration. QEMU guest agent must be running. Valid values: no or yes (default).

FORCE_DESTROY

Force VM cancellation after shutdown timeout

CANCEL_NO_ACPI

Force VMs without ACPI enabled to be destroyed on shutdown

MIGRATE_OPTIONS

Set options for the virsh migrate command

CLEANUP_MEMORY_ON_START

Compact memory before running the VM. Values yes or no (default)

CLEANUP_MEMORY_ON_STOP

Compact memory after VM stops. Values yes (default) or no

DEFAULT_ATTACH_CACHE

This parameter will set the default cache type for new attached disks. It will be used in case the attached disk does not have a specific cache method set (can be set using templates when attaching a disk).

DEFAULT_ATTACH_DISCARD

Default dicard option for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_IO

Default I/O policy for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_BYTES_SEC

Default total bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_BYTES_SEC_MAX

Default Maximum total bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_BYTES_SEC_MAX_LENGTH

Default Maximum length total bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_BYTES_SEC

Default read bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_BYTES_SEC_MAX

Default Maximum read bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_BYTES_SEC_MAX_LENGTH

Default Maximum length read bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_BYTES_SEC

Default write bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_BYTES_SEC_MAX

Default Maximum write bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_BYTES_SEC_MAX_LENGTH

Default Maximum length write bytes/s I/O throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_IOPS_SEC

Default total IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_IOPS_SEC_MAX

Default Maximum total IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_TOTAL_IOPS_SEC_MAX_LENGTH

Default Maximum length total IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_IOPS_SEC

Default read IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_IOPS_SEC_MAX

Default Maximum read IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_READ_IOPS_SEC_MAX_LENGTH

Default Maximum length read IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_IOPS_SEC

Default write IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_IOPS_SEC_MAX

Default Maximum write IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_WRITE_IOPS_SEC_MAX_LENGTH

Default Maximum length write IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_SIZE_IOPS_SEC

Default size of IOPS throttling for newly attached disks, if the attribute is missing in the template.

DEFAULT_ATTACH_NIC_MODEL

Default NIC model for newly attached NICs, if the attribute is missing in the template.

DEFAULT_ATTACH_NIC_FILTER

Default NIC libvirt filter for newly attached NICs, if the attribute is missing in the template.

See the Virtual Machine drivers reference for more information.

Troubleshooting

Image Magic Is Incorrect

When trying to restore the VM from a suspended state this error is returned:

libvirtd1021: operation failed: image magic is incorrect

It can be fixed by applying:

options kvm_intel nested=0
options kvm_intel emulate_invalid_guest_state=0
options kvm ignore_msrs=1