CPU model configuration for QEMU/KVM on x86 hosts

Posted: June 29th, 2018 | Author: | Filed under: Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , , , , , | 1 Comment »

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Skylake-Server
Skylake-Server-IBRS
Intel Xeon Processor (Skylake, 2016)
Skylake-Client
Skylake-Client-IBRS
Intel Core Processor (Skylake, 2015)
Broadwell
Broadwell-IBRS
Broadwell-noTSX
Broadwell-noTSX-IBRS
Intel Core Processor (Broadwell, 2014)
Haswell
Haswell-IBRS
Haswell-noTSX
Haswell-noTSX-IBRS
Intel Core Processor (Haswell, 2013)
IvyBridge
IvyBridge-IBRS
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
SandyBridge
SandyBridge-IBRS
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere
Westmere-IBRS
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Nehalem
Nehalem-IBRS
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Penryn
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Conroe
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

pcid
Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
spec-ctrl
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

EPYC
EPYC-IBPB
AMD EPYC Processor (2017)
Opteron_G5
AMD Opteron 63xx class CPU (2012)
Opteron_G4
AMD Opteron 62xx class CPU (2011)
Opteron_G3
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
Opteron_G2
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
Opteron_G1
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

ibpb
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
virt-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
amd-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
amd-no-ssb
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

qemu32
qemu64
QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

kvm32
kvm64
Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
486
athlon
phenom
coreduo
core2duo
n270
pentium
pentium2
pentium3
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Named model
   <cpu mode='custom'>
       <model name="Westmere"/>
   </cpu>

With feature customization:

   <cpu mode='custom'>
       <model name="Westmere"/>
       <feature name="pcid" policy="require"/>
       ...
   </cpu>

 

Announce: libvirt-sandbox “Dashti Margo” 0.6.0 release – an application sandbox toolkit

Posted: July 1st, 2015 | Author: | Filed under: Fedora, libvirt, Security, Virt Tools | Tags: , , , , , | 2 Comments »

I pleased to announce the a new public release of libvirt-sandbox, version 0.6.0, is now available from:

http://sandbox.libvirt.org/download/

The packages are GPG signed with

  Key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF (4096R)

The libvirt-sandbox package provides an API layer on top of libvirt-gobject which facilitates the cration of application sandboxes using virtualization technology. An application sandbox is a virtual machine or container that runs a single application binary, directly from the host OS filesystem. In other words there is no separate guest operating system install to build or manage.

At this point in time libvirt-sandbox can create sandboxes using either LXC or KVM, and should in theory be extendable to any libvirt driver.

This release contains a mixture of new features and bugfixes.

The first major feature is the ability to provide block devices to sandboxes. Most of the time sandboxes only want/need filesystems, but there are some use cases where block devices are useful. For example, some applications (like databases) can directly use raw block devices for storage. Another one is where a tool actually wishes to be able to format filesystems and have this done inside the container. The complexity with exposing block devices is giving the sandbox tools a predictable path for accessing the device which does not change across hypervisors. To solve this, instead of allowing users of virt-sandbox to specify a block device name, they provide an opaque tag name. The block device is then made available at a path /dev/disk/by-tag/TAGNAME, which symlinks back to whatever hypervisor specific disk name was used.

The second major feature is the ability to provide a custom root filesystem for the sandbox. The original intent of the sandbox tool was that it provide an easy way to confine and execute applications that are installed on the host filesystem, so by default the host / filesystem is mapped to the sandbox / filesystem read-only. There are some use cases, however, where the user may wish to have a completely different root filesystem. For example, they may wish to execute applications from some separate disk image. So virt-sandbox now allows the user to map in a different root filesystem for the sandbox.

Both of these features were developed as part of a Google Summer of Code 2015 project which is aiming to enhance libvirt sandbox so that it is capable of executing images distributed by the Docker container image repository service. The motivation for this goes back to the original reason for creating the libvirt-sandbox project in the first place, which was to provide a hypervisor agnostic framework for sandboxing applications, as a higher level above the libvirt API. Once this is work is complete it’ll be possible to launch Docker images via libvirt QEMU, KVM or LXC, with no need for the Docker toolchain itself.

The detailed list of changes in this release is:

  • API/ABI in-compatible change, soname increased
  • Prevent use of virt-sandbox-service as non-root upfront
  • Fix misc memory leaks
  • Block SIGHUP from the dhclient binary to prevent accidental death if the controlling terminal is closed & reopened
  • Add support for re-creating libvirt XML from sandbox config to facilitate upgrades
  • Switch to standard gobject introspection autoconf macros
  • Add ability to set filters on network interfaces
  • Search /usr/lib instead of /lib for systemd unit files, as the former is the canonical location even when / and /usr are merged
  • Only set SELinux labels on hosts that support SELinux
  • Explicitly link to selinux, instead of relying on indirect linkage
  • Update compiler warning flags
  • Fix misc docs comments
  • Don’t assume use of SELinux in virt-sandbox-service
  • Fix path checks for SUSE in virt-sandbox-service
  • Add support for AppArmour profiles
  • Mount /var after other FS to ensure host image is available
  • Ensure state/config dirs can be accessed when QEMU is running non-root for qemu:///system
  • Fix mounting of host images in QEMU sandboxes
  • Mount images as ext4 instead of ext3
  • Allow use of non-raw disk images as filesystem mounts
  • Check if required static libs are available at configure time to prevent silent fallback to shared linking
  • Require libvirt-glib >= 0.2.1
  • Add support for loading lzma and gzip compressed kmods
  • Check for support libvirt URIs when starting guests to ensure clear error message upfront
  • Add LIBVIRT_SANDBOX_INIT_DEBUG env variable to allow debugging of kernel boot messages and sandbox init process setup
  • Add support for exposing block devices to sandboxes with a predictable name under /dev/disk/by-tag/TAGNAME
  • Use devtmpfs instead of tmpfs for auto-populating /dev in QEMU sandboxes
  • Allow setup of sandbox with custom root filesystem instead of inheriting from host’s root.
  • Allow execution of apps from non-matched ld-linux.so / libc.so, eg executing F19 binaries on F22 host
  • Use passthrough mode for all QEMU filesystems

Announce: libvirt-sandbox “Cholistan” 0.5.1 release – an application sandbox toolkit

Posted: November 19th, 2013 | Author: | Filed under: Fedora, libvirt, Security, Virt Tools | Tags: , , , , , | No Comments »

I pleased to announce the a new public release of libvirt-sandbox, version 0.5.1, is now available from:

http://sandbox.libvirt.org/download/

The packages are GPG signed with

  Key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF (4096R)

The libvirt-sandbox package provides an API layer on top of libvirt-gobject which facilitates the cration of application sandboxes using virtualization technology. An application sandbox is a virtual machine or container that runs a single application binary, directly from the host OS filesystem. In other words there is no separate guest operating system install to build or manage.

At this point in time libvirt-sandbox can create sandboxes using either LXC or KVM, and should in theory be extendable to any libvirt driver.

This release focused on exclusively on bugfixing

Changed in this release:

  • Fix path to systemd binary (prefers dir /lib/systemd not /bin)
  • Remove obsolete commands from virt-sandbox-service man page
  • Fix delete of running service container
  • Allow use of custom root dirs with ‘virt-sandbox –root DIR’
  • Fix ‘upgrade’ command for virt-sandbox-service generic services
  • Fix logrotate script to use virsh for listing sandboxed services
  • Add ‘inherit’ option for virt-sandbox ‘-s’ security context option, to auto-copy calling process’ context
  • Remove non-existant ‘-S’ option froom virt-sandbox-service man page
  • Fix line break formatting of man page
  • Mention LIBVIRT_DEFAULT_URI in virt-sandbox-service man page
  • Check some return values in libvirt-sandbox-init-qemu
  • Remove unused variables
  • Fix crash with partially specified mount option string
  • Add man page docs for ‘ram’ mount type
  • Avoid close of un-opened file descriptor
  • Fix leak of file handles in init helpers
  • Log a message if sandbox cleanup fails
  • Cope with domain being missing when deleting container
  • Improve stack trace diagnostics in virt-sandbox-service
  • Fix virt-sandbox-service content copying code when faced with non-regular files.
  • Improve error reporting if kernel does not exist
  • Allow kernel version/path/kmod to be set with virt-sandbox
  • Don’t overmount ‘/root’ in QEMU sandboxes by default
  • Fix nosuid / nodev mount options for tmpfs
  • Force 9p2000.u protocol version to avoid QEMU bugs
  • Fix cleanup when failing to start interactive sandbox
  • Create copy of kernel from /boot to allow relabelling
  • Bulk re-indent of code
  • Avoid crash when gateway is missing in network options
  • Fix symlink target created in multi-user.target.wants
  • Add ‘-p PATH’ option for virt-sandbox-service clone/delete to match ‘create’ command option.
  • Only allow ‘lxc:///’ URIs with virt-sandbox-service until further notice
  • Rollback state if cloning a service sandbox fails
  • Add more kernel modules instead of assuming they are all builtins
  • Don’t complain if some kmods are missing, as they may be builtins
  • Allow –mount to be repeated with virt-sandbox-service

Thanks to everyone who contributed to this release

Creating a “head outline” image for team photographs with Fedora and GIMP

Posted: November 26th, 2012 | Author: | Filed under: Fedora, libvirt, Photography, Virt Tools | Tags: , , , , , | 3 Comments »

Two weeks back, I was in Barcelona for  LinuxCon Europe / KVM Forum 2012. While there Jeff Cody acquired a photo of many of the KVM community developers. Although already visible on Google+, along with tags to identify all the faces, I wanted to put up an outline view of the photo too, mostly so that I could then write this blog post describing how to create the head outline :-) The steps on this page were all performed using Fedora 17 and GIMP 2.8.2, but this should work with pretty much every version of GIMP out there since there’s nothing fancy going on.

The master photo

The master photo that we’ll be working with is

Step 1: Edge detect

It was thought that one of the edge detection algorithms available in GIMP would be a good basis for providing a head outline. After a little trial & error, I picked ‘Filters -> Edge-detect -> Edge..’, then chose the ‘Laplace’ algorithm.

This resulted in the following image

Step 2: Invert colours

The previous image shows the outlines quite effectively, but my desire is for a primarily white image, with black outlines. This is easily achieved using the menu option ‘Colours -> Invert’

Step 3: Desaturate

The edge detection algorithm leaves some colour artifacts in the images, which are trivially dealt with by desaturating the image using ‘Colours -> Desaturate…’ and any one of the desaturation algorithms GIMP offers.

Step 4: Boost contrast

The outline looks pretty good, but there is still a fair amount of fine detail “noise”. There are a few ways we might get rid of this – in particular some of GIMPs noise removal filters. I went for the easy option of simply boosting the overall image contrast, using ‘Colours -> Brightness/Contrast…’

For this image, setting the contrast to ’40’ worked well, vary according to the particular characteristics of the image

Step 5: Add numbers

The outline view is where we want to be, but the whole point of the exercise is to make it easy to put names to faces. Thus the final step is to simply number each head. GIMP’s text tool is the perfect way to do this, just click on each face in turn and type in a number.

No need to worry about perfect placement, since each piece of text becomes a new layer. Once done, the layer positions can be moved around to fit well.

And that’s the final image completed. In the page I created on the KVM website, a little javascript handled swapping between the original & outline views on mouse over, but that’s all there is to it. The hardest part of the whole exercise is actually remembering who everyone is :-P

KVM Forum: building application sandboxes on top of KVM or LXC using libvirt

Posted: November 8th, 2012 | Author: | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , | No Comments »

This week I have spent my time at LinuxCon Europe and KVM Forum 2012. I gave a talk titled “Building application sandboxes on top of KVM or LXC using libvirt”. For those who enquired afterwards, the slides are now available.