Monitoring Driver

The Monitoring Drivers (or IM drivers) collect Host and Virtual Machine monitoring data by executing a monitoring agent in the Hosts. The agent periodically executes probes to collect data and periodically send this to the Front-end.

This guide describes the internals of the monitoring system. It is also a starting point on how to create a new IM driver from scratch.

Message structure

The structure of monitoring message is:

MESSAGE_TYPE ID RESULT TIMESTAMP PAYLOAD
NameDescription
MESSAGE_TYPESYSTEM_HOST, MONITOR_HOST, BEACON_HOST, MONITOR_VM or STATE_VM
IDID of the Host, which generates the message.
RESULTResult of the action, possible values SUCCESS or FAILURE
TIMESTAMPTimestamp of the message as unix epoch time
PAYLOADMessage data, depends on MESSAGE_TYPE

Description of message types:

  • SYSTEM_HOST - General information about the Host, which doesn’t change too often (e.g., total memory, disk capacity, datastores, pci devices, NUMA nodes…)
  • MONITOR_HOST - Monitoring information: used memory, used cpu, network traffic…
  • BEACON_HOST - notification message, indicating that the agent is still alive
  • MONITOR_VM - VMs monitoring information: used memory, used CPUs, disk io…
  • STATE_VM - VMs state: running, poweroff…

The provided hypervisors compose each message from data provided by probes in a specific directory:

  • SYSTEM_HOST - im/<hypervisor>-probes.d/host/system
  • MONITOR_HOST - im/<hypervisor>-probes.d/host/monitor
  • BEACON_HOST - im/<hypervisor>-probes.d/host/beacon
  • MONITOR_VM - im/<hypervisor>-probes.d/vm/monitor
  • STATE_VM - im/<hypervisor>-probes.d/vm/status

Each IM probe is composed of one or several scripts that write information to stdout in this form:

KEY1="value"
KEY2="another value with spaces"

Basic Monitoring Scripts

Mandatory values for each category are described below:

SYSTEM_HOST Message

KeyDescription
HYPERVISORName of the hypervisor of the Host, useful for
selecting the hosts with an specific technology.
TOTALCPUNumber of CPUs multiplied by 100. For example,
a 16 cores machine will have a value of 1600.
CPUSPEEDSpeed in Mhz of the CPUs.
TOTALMEMORYMaximum memory that could be used for VMs. It is advised
to take out the memory used by the hypervisor.

MONITOR_HOST Message

KeyDescription
USEDMEMORYMemory used, in kilobytes.
FREEMEMORYAvailable memory for VMs at that moment, in kilobytes.
FREECPUPercentage of idling CPU multiplied by the number of cores. For example, if 50%
of the CPU is idling in a 4 core machine the value will be 200.
USEDCPUPercentage of used CPU multiplied by the number of cores.
NETRXReceived bytes from the network
NETTXTransferred bytes to the network

BEACON_HOST Message

No data

MONITOR_VM Message

The format of the MONITOR_VM Message:

VM = [ ID="0",
       UUID="6c1e1565-50f4-43b6-ba71-0fe46477d2ec",
       MONITOR="Q1BVPSIxLjAxIgpNRU1PUlk9IjE0MDgxNiIKTkVUUlg9IjAiCk5FVFRYPSIwIgpESVNLUkRCWVRFUz0iNDQxNjU0NDQiCkRJU0tXUkJZVEVTPSIxMjY2Njg4IgpESVNLUkRJT1BTPSIxMjg5IgpESVNLV1JJT1BTPSI4ODEiCg=="]
VM = [ ID="1",
       ... ]
KeyDescription
IDID of the VM in OpenNebula.
UUIDUnique ID, must be unique across all Hosts.
MONITORBase64 encoded monitoring information, the monitoring information includes following data:
TIMESTAMPTimestamp of the measurement.
CPUPercentage of 1 CPU consumed (two fully consumed cpu is 200).
MEMORYMEMORY consumption in kilobytes.
DISKRDBYTESAmount of bytes read from disk.
DISKRDIOPSNumber of IO read operations.
DISKWRBYTESAmount of bytes written to disk.
DISKWRIOPSNumber of IO write operations.
NETRXReceived bytes from the network.
NETTXSent bytes to the network.

STATE_VM Message

The format of the STATE_VM message is:

VM=[
  ID=115,
  DEPLOY_ID=one-115,
  UUID="6c1e1565-50f4-43b6-ba71-0fe46477d2ec",
  STATE="RUNNING" ]
VM=[
  ID=116,
  DEPLOY_ID=one-116,
  UUID="1a3f2513-50f4-43b6-ba71-0fe46477d2ec",
  STATE="POWEROFF" ]
KeyDescription
IDID of the VM in OpenNebula.
DEPLOY_IDID of the VM in the hypervisor, usually unique in Host.
UUIDUnique ID, must be unique across all Hosts.
STATEState of the VM (running, poweroff, …).

System Datastore Information

Monitoring probes are also responsible to collect the datastore sizes and its available space. The datastores information is included in SYSTEM_HOST message.

DS_LOCATION_USED_MB=1
DS_LOCATION_TOTAL_MB=12639
DS_LOCATION_FREE_MB=10459
DS = [
  ID = 0,
  USED_MB = 1,
  TOTAL_MB = 12639,
  FREE_MB = 10459
]
DS = [
  ID = 1,
  USED_MB = 1,
  TOTAL_MB = 12639,
  FREE_MB = 10459
]
DS = [
  ID = 2,
  USED_MB = 1,
  TOTAL_MB = 12639,
  FREE_MB = 10459
]

These are the meanings of the values:

VariableDescription
DS_LOCATION_USED_MBUsed space in megabytes in the DATASTORE LOCATION
DS_LOCATION_TOTAL_MBTotal space in megabytes in the DATASTORE LOCATION
DS_LOCATION_FREE_MBFREE space in megabytes in the DATASTORE LOCATION
IDID of the datastore, this is the same as the name of the directory
USED_MBUsed space in megabytes for that datastore
TOTAL_MBTotal space in megabytes for that datastore
FREE_MBFree space in megabytes for that datastore

The DATASTORE LOCATION is the path where the datastores are mounted. By default, it is /var/lib/one/datastores but it is specified in the second parameter of the script call.

Creating a New IM Driver

Choosing the Execution Engine

OpenNebula provides two IM probe execution engines: one_im_sh and one_im_ssh. one_im_sh is used to execute probes in the Front-end; one_im_ssh is used when probes need to be run remotely on the Hosts, which is the case for KVM.

Populating the Probes

Both one_im_sh and one_im_ssh require an argument which indicates the directory that contains the probes. This argument is appended with ”.d”. Also, you need to create:

  • The /var/lib/one/remotes/im/<im_name>.d directory with only 2 files, the same ones that are provided by default inside kvm.d, which are: collectd-client_control.sh and collectd-client.rb.
  • The probes should be actually placed in the /var/lib/one/remotes/im/<im_name>-probes.d folder.

Enabling the Driver

A new IM section should be placed added to monitord.conf.

Example:

IM_MAD = [
      name       = "ganglia",
      executable = "one_im_sh",
      arguments  = "ganglia" ]