A new (configurable) cgroups layout for libvirt with QEMU, KVM & LXC

Posted: May 13th, 2013 | Author: | Filed under: Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , , | 1 Comment »

Several years ago I wrote a bit about libvirt and cgroups in Fedora 12. Since that time, much has changed, and we’ve learnt alot about the use of cgroups, not all of it good.

Perhaps the biggest change has been the arrival of systemd, which has brought cgroups to the attention of a much wider audience. One of the biggest positive impacts of systemd on cgroups, has been a formalization of how to integrate with cgroups as an application developer. Libvirt of course follows these cgroups guidelines, has had input into their definition & continues to work with the systemd community to improve them.

One of the things we’ve learnt the hard way is that the kernel implementation of control groups is not without cost, and the way applications use cgroups can have a direct impact on the performance of the system. The kernel developers have done a great deal of work to improve the performance and scalability of cgroups but there will always be a cost to their usage which application developers need to be aware of. In broad terms, the performance impact is related to the number of cgroups directories created and particularly to their depth.

To cut a long story short, it became clear that the directory hierarchy layout libvirt used with cgroups was seriously sub-optimal, or even outright harmful. Thus in libvirt 1.0.5, we introduced some radical changes to the layout created.

Historically libvirt would create a cgroup directory for each virtual machine or container, at a path $LOCATION-OF-LIBVIRTD/libvirt/$DRIVER-NAME/$VMNAME. For example, if libvirtd was placed in /system/libvirtd.service, then a QEMU guest named “web1” would live at /system/libvirtd.service/libvirt/qemu/web1. That’s 5 levels deep already, which is not good.

As of libvirt 1.0.5, libvirt will create a cgroup directory for each virtual machine or container, at a path /machine/$VMNAME.libvirt-$DRIVER-NAME. First notice how this is now completely disassociated from the location of libvirtd itself. This allows the administrator greater flexibility in controlling resources for virtual machines independently of system services. Second notice that the directory hierarchy is only 2 levels deep by default, so a QEMU guest named “web” would live at /machine/web1.libvirt-qemu

The final important change is that the location of virtual machine / container can now be configured on a per-guest basis in the XML configuration, to override the default of /machine. So if the guest config says

  <resource>
    <partition>/virtualmachines/production</partition>
  </resource>

then libvirt will create the guest cgroup directory /virtualmachines.partition/production.partition/web1.libvirt-qemu. Notice that there will always be a .partition suffix on these user defined directories. Only the default top level directories /machine, /system and /user will be without a suffix. The suffix ensures that user defined directories can never clash with anything the kernel will create. The systemd PaxControlGroups will be updated with this & a few escaping rules soon.

There is still more we intend todo with cgroups in libvirt, in particular adding APIs for creating & managing these partitions for grouping VMs, so you don’t need to go to a tool outside libvirt to create the directories.

One final thing, libvirt now has a bit of documentation about its cgroups usage which will serve as the base for future documentation in this area.

Troubleshooting libvirt with the KVM and LXC drivers

Posted: October 3rd, 2011 | Author: | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , , , | 1 Comment »

In “fantasy island” the libvirt and KVM/LXC code is absolutely perfect and always does exactly what you want it todo. Back in the real world, however, there may be annoying bugs in libvirt, KVM/LXC, the kernel and countless other parts of the OS that conspire to cause you great pain and suffering. This blog post contains a very quick introduction to debugging/troubleshooting libvirt problems, particularly focusing on the KVM and LXC drivers.

libvirt logging capabilities

The libvirt code is full of logging statements which can be instrumental in understanding where a problem might lie.

Configuring libvirtd logging

Current releases of libvirt will log problems occurring in libvirtd at level WARNING/ERROR to a dedicated log file /var/log/libvirt/libvirtd.log, while older releases would be send them to syslog, typically ending up in /var/log/messages. The libvirtd configuration file has two parameters that can be used to increase the amount of logging information printed.

log_filters="...filter string..."
log_outputs="...destination config..."

The logging documentation describes these in some detail. If you just want to quickly get started though, it suffices to understand that filter strings are simply doing substring matches against libvirt source filenames. So to enable all debug information from ‘src/util/event.c’ (the libvirt event loop) you would set

log_filters="1:event"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

If you wanted to enable logging for everything in ‘src/util’, except for ‘src/util/event.c’ you would set

log_filters="3:event 1:util"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

Configuring libvirt client logging

On the client side of libvirt there is no configuration file to put log settings in, so instead, there are a couple of environment variables. These take exactly the same type of strings as the libvirtd configuration file

LIBVIRT_LOG_FILTERS="...filter string..."
LIBVIRT_LOG_OUTPUTS="...destination config..."
export LIBVIRT_LOG_FILTERS LIBVIRT_LOG_OUTPUTS

One thing to be aware of is that with the KVM and LXC drivers in libvirt, very little code is ever run on the libvirt client. The only interesting pieces are the RPC code, event loop and main API entrypoints. To enable debugging of the RPC code you might use

LIBVIRT_LOG_FILTERS="1:rpc" LIBVIRT_LOG_OUTPUTS="1:stderr" virsh list

Useful log filter settings for KVM and LXC

The following are some useful values for logging wrt the KVM and LXC drivers

All libvirt public APIs invoked
1:libvirt
All external commands run by libvirt
1:command
Cgroups management
1:cgroup
All QEMU driver code
1:qemu
QEMU text monitor commands
1:qemu_monitor_text
QEMU JSON/QMP monitor commands
1:qemu_monitor_json
All LXC driver code
1:lxc
All lock management code
1:locking
All security manager code
1:security

QEMU driver logfiles

Every QEMU process run by libvirt has a dedicated log file /var/log/libvirt/qemu/$VMNAME.log which captures any data that QEMU writes to stderr/stdout. It also contains timestamps written by libvirtd whenever the QEMU process is started, and exits. Finally, prior to starting a guest, libvirt will write out the full set of environment variables and command line arguments it intends to launch QEMU with.

If you are running libvirtd with elevated log settings, there is also the possibility that some of the logging output will end up in the per-VM logfile, instead of the location set by the log_outputs configuration parameter. This is because a little bit of libvirt code will run in the child process between the time it is forked and QEMU is exec()d.

LXC driver logfiles

Every LXC process run by libvirt has a dedicated log file /var/log/libvirt/qemu/$VMNAME.log which captures any data that QEMU writes to stderr/stdout. As with QEMU it will also contain the command line args libvirt uses, though these are much less interesting in the LXC case. The LXC logfile is mostly useful for debugging the initial container bootstrap process.

Troubleshooting SELinux / sVirt

On a RHEL or Fedora host, the out of the box configuration will run all guests under confined SELinux contexts. One common problem that may affect developers running libvirtd straight from the source tree is that libvirtd itself will run under the wrong context, which in turn prevents guests from running correctly. This can be addressed in two ways, first by manually labelling the libvirtd binary after each rebuild

chcon system_u:object_r:virtd_exec_t:s0 $SRCTREE/daemon/.libs/lt-libvirtd

Or by specifying a label when executing libvirtd

runcon system_u:object_r:virtd_exec_t:s0 $SRCTREE/daemon/libvirtd

Another problem might be with libvirt not correctly labelling some device needed by the QEMU process. The best way to see what’s going on here, is to enable libvirtd logging with a filter of “1:security_selinux”, which will print out a message for every single file path that libvirtd labels. Then look at the log to see that everything expected is present:

14:36:57.223: 14351: debug : SELinuxGenSecurityLabel:284 : model=selinux label=system_u:system_r:svirt_t:s0:c669,c903 imagelabel=system_u:object_r:svirt_image_t:s0:c669,c903 baselabel=(null)
14:36:57.350: 14351: info : SELinuxSetFilecon:402 : Setting SELinux context on '/var/lib/libvirt/images/f16x86_64.img' to 'system_u:object_r:svirt_image_t:s0:c669,c903'
14:36:57.350: 14351: info : SELinuxSetFilecon:402 : Setting SELinux context on '/home/berrange/boot.iso' to 'system_u:object_r:virt_content_t:s0'
14:36:57.551: 14351: debug : SELinuxSetSecurityDaemonSocketLabel:1129 : Setting VM f16x86_64 socket context unconfined_u:unconfined_r:unconfined_t:s0:c669,c903

If a guest is failing to start, then there are two ways to double check if it really is SELinux related. SELinux can be put into permissive mode on the virtualization host

setenforce 0

Or the sVirt driver can be disabled in libvirt entirely

# vi /etc/libvirt/qemu.conf
...set 'security_driver="none" ...
# service libvirtd restart

Troubleshooting cgroups

When libvirt runs guests on modern Linux systems, cgroups will be used to control aspects of the guests’ execution. If any cgroups are mounted on the host when libvirtd starts up, it will create a basic hierarchy

$MOUNT_POINT
 |
 +- libvirt
     |
     +- qemu
     +- lxc

When starting a KVM or LXC guest, further directories will be created, one per guest, so that after a while the tree will look like

$MOUNT_POINT
 |
 +- libvirt
     |
     +- qemu
     |    |
     |    +- VMNAME1
     |    +- VMNAME1
     |    +- VMNAME1
     |    +- ...
     |    ...
     +- lxc
          |
          +- VMNAME1
          +- VMNAME1
          +- VMNAME1
          +- ...

Assuming the host administrator has not changed the policy in the top level cgroups, there should be no functional change to operation of the guests with this default setup. There are possible exceptions though if you are trying something unusal. For example, the ‘devices’ cgroups controller will be used to setup a whitelist of block / character devices that QEMU is allowed to access. So if you have modified QEMU to access to funky new device, libvirt will likely block this via the cgroups device ACL. Due to various kernel bugs, some of the cgroups controllers have also had a detrimental performance impact on both QEMU guest and the host OS as a whole.

libvirt will never try to mount any cgroups itself, so the quickest way to stop libvirt using cgroups is to stop the host OS from mounting them. This is not always desirable though, so there is a configuration parameter in /etc/libvirt/qemu.conf which can be used to restrict what cgroups libvirt will use.

Running from the GIT source tree

Sometimes when troubleshooting a particularly hard problem it might be desirable to build libvirt from the latest GIT source and run that. When doing this is a good idea not to overwrite your distro provided installation with a GIT build, but instead run libvirt directly from the source tree. The first thing to be careful of is that the custom build uses the right installation prefix (ie /etc, /usr, /var and not /usr/local). To simplify this libvirt provides an ‘autogen.sh’ script to run all the right libtool commands and set the correct prefixes. So to build libvirt from GIT, in a way that is compatible with a typical distro build use:

./autogen.sh --system --enable-compile-warnings=error
make

Hint: use make -j 4 (or larger) to significantly speed up the build on multi-core systems

To run libvirtd from the source tree, as root, stop the existing daemon and invoke the libtool wrapper script

# service libvirtd stop
# ./daemon/libvirtd

Or to run with SELinux contexts

# service libvirtd stop
# runcon system_u:system_r:virtd_t:s0-s0:c0.c1023 ./daemon/libvirtd

virsh can easily be run from the source tree in the same way

# ./tools/virsh ....normal args...

Running python programs against a non-installed libvirt gets a little harder, but that can be overcome too

$ export PYTHONPATH=$SOURCETREE/python:$SOURCETREE/python/.libs
$ export LD_LIBRARY_PATH=$SOURCETREE/src/.libs
$ python virt-manager --no-fork

When running the LXC driver, it is necessary to make a change to the guest XML to point it to a different emulator. Running ‘virsh edit $GUEST’ change

/usr/libexec/libvirt_lxc

to

$SOURCETREE/src/libvirt_lxc

(expand $SOURCETREE to be the actual path of the GIT checkout – libvirt won’t interpret env vars in the XML)