Many of the management problems in virtualization are caused by the annoyingly popular & desirable host migration feature! I previously talked about PCI device addressing problems, but this time the topic to consider is that of CPU models. Every hypervisor has its own policies for what a guest will see for its CPUs by default, Xen just passes through the host CPU, with QEMU/KVM the guest sees a generic model called “qemu32” or “qemu64”. VMWare does something more advanced, classifying all physical CPUs into a handful of groups and has one baseline CPU model for each group that’s exposed to the guest. VMWare’s behaviour lets guests safely migrate between hosts provided they all have physical CPUs that classify into the same group. libvirt does not like to enforce policy itself, preferring just to provide the mechanism on which the higher layers define their own desired policy. CPU models are a complex subject, so it has taken longer than desirable to support their configuration in libvirt. In the 0.7.5 release that will be in Fedora 13, there is finally a comprehensive mechanism for controlling guest CPUs.
Learning about the host CPU model
If you have been following earlier articles (or otherwise know a bit about libvirt) you’ll know that the “virsh capabilities” command displays an XML document describing the capabilities of the hypervisor connection & host. It should thus come as no surprise that this XML schema has been extended to provide information about the host CPU model. One of the big challenges in describing a CPU models is that every architecture has different approach to exposing their capabilities. On x86, a modern CPUs’ capabilities are exposed via the CPUID instruction. Essentially this comes down to a set of 32-bit integers with each bit given a specific meaning. Fortunately AMD & Intel agree on common semantics for these bits. VMWare and Xen both expose the notion of CPUID masks directly in their guest configuration format. Unfortunately (or fortunately depending on your POV) QEMU/KVM support far more than just the x86 architecture, so CPUID is clearly not suitable as the canonical configuration format. QEMU ended up using a scheme which combines a CPU model name string, with a set of named flags. On x86 the CPU model maps to a baseline CPUID mask, and the flags can be used to then toggle bits in the mask on or off. libvirt decided to follow this lead and use a combination of a model name and flags. Without further ado, here is an example of what libvirt reports as the capabilities of my laptop’s CPU
# virsh capabilities <capabilities> <host> <cpu> <arch>i686</arch> <model>pentium3</model> <topology sockets='1' cores='2' threads='1'/> <feature name='lahf_lm'/> <feature name='lm'/> <feature name='xtpr'/> <feature name='cx16'/> <feature name='ssse3'/> <feature name='tm2'/> <feature name='est'/> <feature name='vmx'/> <feature name='ds_cpl'/> <feature name='monitor'/> <feature name='pni'/> <feature name='pbe'/> <feature name='tm'/> <feature name='ht'/> <feature name='ss'/> <feature name='sse2'/> <feature name='acpi'/> <feature name='ds'/> <feature name='clflush'/> <feature name='apic'/> </cpu> ...snip...
In it not practical to have a database listing all known CPU models, so libvirt has a small list of baseline CPU model names. It picks the one that shares the greatest number of CPUID bits with the actual host CPU and then lists the remaining bits as named features. Notice that libvirt does not tell you what features the baseline CPU contains. This might seem like a flaw at first, but as will be shown next, it is not actually necessary to know this information.
Determining a compatible CPU model to suit a pool of hosts
Now that it is possible to find out what CPU capabilities a single host has, the next problem is to determine what CPU capabilities are best to expose to the guest. If it is known that the guest will never need to be migrated to another host, the host CPU model can be passed straight through unmodified. Some lucky people might have a virtualized data center where they can guarantee all servers will have 100% identical CPUs. Again the host CPU model can be passed straight through unmodified. The interesting case though, is where there is variation in CPUs between hosts. In this case the lowest common denominator CPU must be determined. This is not entirely straightforward, so libvirt provides an API for exactly this task. Provide libvirt with a list of XML documents, each describing a host’s CPU model, and it will internally convert these to CPUID masks, calculate their intersection, finally converting the CPUID mask result back into an XML CPU description. Taking the CPU description from a random server
<capabilities> <host> <cpu> <arch>x86_64</arch> <model>phenom</model> <topology sockets='2' cores='4' threads='1'/> <feature name='osvw'/> <feature name='3dnowprefetch'/> <feature name='misalignsse'/> <feature name='sse4a'/> <feature name='abm'/> <feature name='cr8legacy'/> <feature name='extapic'/> <feature name='cmp_legacy'/> <feature name='lahf_lm'/> <feature name='rdtscp'/> <feature name='pdpe1gb'/> <feature name='popcnt'/> <feature name='cx16'/> <feature name='ht'/> <feature name='vme'/> </cpu> ...snip...
As a quick check is it possible to ask libvirt whether this CPU description is compatible with the previous laptop CPU description, using the “
virsh cpu-compare” command
$ ./tools/virsh cpu-compare cpu-server.xml CPU described in cpu-server.xml is incompatible with host CPU
libvirt is correctly reporting the CPUs are incompatible, because there are several features in the laptop CPU that are missing in the server CPU. To be able to migrate between the laptop and the server, it will be necessary to mask out some features, but which ones ? Again libvirt provides an API for this, also exposed via the “
virsh cpu-baseline” command
# virsh cpu-baseline both-cpus.xml <cpu match='exact'> <model>pentium3</model> <feature policy='require' name='lahf_lm'/> <feature policy='require' name='lm'/> <feature policy='require' name='cx16'/> <feature policy='require' name='monitor'/> <feature policy='require' name='pni'/> <feature policy='require' name='ht'/> <feature policy='require' name='sse2'/> <feature policy='require' name='clflush'/> <feature policy='require' name='apic'/> </cpu>
libvirt has determined that in order to safely migrate a guest between the laptop and the server, it is neccesary to mask out 11 features from the laptop’s XML description.
Configuring the guest CPU model
To simplify life, the guest CPU configuration accepts the same basic XML representation as the host capabilities XML exposes. In other words, the XML from that “cpu-baseline” virsh command, can now be copied directly into the guest XML at the top level under the <domain> element. As the observant reader will have noticed from that last XML snippet, there are a few extra attributes available when describing a CPU in the guest XML. These can mostly be ignored, but for the curious here’s a quick description of what they do. The top level <cpu> element gets an attribute called “match” with possible values
- match=”minimum” – the host CPU must have at least the CPU features described in the guest XML. If the host has additional features beyond the guest configuration, these will also be exposed to the guest
- match=”exact” – the host CPU must have at least the CPU features described in the guest XML. If the host has additional features beyond the guest configuration, these will be masked out from the guest
- match=”strict” – the host CPU must have exactly the same CPU features described in the guest XML. No more, no less.
The next enhancement is that the <feature> elements can each have an extra “policy” attribute with possible values
- policy=”force” – expose the feature to the guest even if the host does not have it. This is kind of crazy, except in the case of software emulation.
- policy=”require” – expose the feature to the guest and fail if the host does not have it. This is the sensible default.
- policy=”optional” – expose the feature to the guest if it happens to support it.
- policy=”disable” – if the host has this feature, then hide it from the guest
- policy=”forbid” – if the host has this feature, then fail and refuse to start the guest
The “forbid” policy is for a niche scenario where a badly behaved application will try to use a feature even if it is not in the CPUID mask, and you wish to prevent accidentally running the guest on a host with that feature. The “optional” policy has special behaviour wrt migration. When the guest is initially started the flag is optional, but when the guest is live migrated, this policy turns into “require”, since you can’t have features disappearing across migration.
All the stuff described in this posting is currently implemented for libvirt’s QEMU/KVM driver, basic code in the 0.7.5/6 releases, but the final ‘cpu-baseline’ stuff is arriving in 0.7.7. Needless to say this will all be available in Fedora 13 and future RHEL. This obviously also needs to be ported over to the Xen and VMWare ESX drivers in libvirt, which isn’t as hard as it sounds, because libvirt has a very good internal API for processed CPUID masks now. Kudos to Jiri Denemark for doing all the really hardwork on this CPU modelling system!