This blog is part 7 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
The live migration feature in QEMU allows a running VM to be moved from one host to another with no noticeable interruption in service and minimal performance impact. The live migration data stream will contain a serialized copy of state of all emulated devices, along with all the guest RAM. In some versions of QEMU it is also used to transfer disk image content, but in modern QEMU use of the NBD protocol is preferred for this purpose. The guest RAM in particular can contain sensitive data that needs to be protected against any would be attackers on the network between source and target hosts. There are a number of ways to provide such security using external tools/services including VPNs, IPsec, SSH/stunnel tunnelling. The libvirtd daemon often already has a secure connection between the source and destination hosts for its own purposes, so many years back support was added to libvirt to automatically tunnel the live migration data stream over libvirt’s own secure connection. This solved both the encryption and authentication problems at once, but there are some downsides to this approach. Tunnelling the connection means extra data copies for the live migration traffic and when we look at guests with RAM many GB in size, the number of data copies will start to matter. The libvirt tunnel only supports a tunnelling of a single data connection and in future QEMU may well wish to use multiple TCP connections for the migration data stream to improve performance of post-copy. The use of NBD for storage migration is not supported with tunnelling via libvirt, since it would require extra connections too. IOW while tunnelling over libvirt was a useful short term hack to provide security, it has outlived its practicality.
It is clear that QEMU needs to support TLS encryption natively on its live migration connections. The QEMU migration code has historically had its own distinct I/O layer called QEMUFile which mixes up tracking of migration state with the connection establishment and I/O transfer support. As mentioned in previous blog post, QEMU now has a general purpose I/O channel framework, so the bulk of the work involved converting the migration code over to use the QIOChannel classes and APIs, which greatly reduced the amount of code in the QEMU
migration/ sub-folder as well as simplifying it somewhat. The TLS support involves the addition of two new parameters to the migration code. First the “
tls-creds” parameter provides the ID of a previously created TLS credential object, thus enabling use of TLS on the migration channel. This must be set on both the source and target QEMU’s involved in the migration.
On the target host, QEMU would be launched with a set of TLS credentials for a server endpoint:
$ qemu-system-x86_64 -monitor stdio -incoming defer \
-object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \
To enable incoming TLS migration 2 monitor commands are then used
(qemu) migrate_set_str_parameter tls-creds tls0
(qemu) migrate_incoming tcp:myhostname:9000
On the source host, QEMU is launched in a similar manner but using client endpoint credentials
$ qemu-system-x86_64 -monitor stdio \
-object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \
To enable outgoing TLS migration 2 monitor commands are then used
(qemu) migrate_set_str_parameter tls-creds tls0
(qemu) migrate tcp:otherhostname:9000
The migration code supports a number of different protocols besides just “
tcp:“. In particular it allows an “
fd:” protocol to tell QEMU to use a passed-in file descriptor, and an “
exec:” protocol to tell QEMU to launch an external command to tunnel the connection. It is desirable to be able to use TLS with these protocols too, but when using TLS the client QEMU needs to know the hostname of the target QEMU in order to correctly validate the x509 certificate it receives. Thus, a second “
tls-hostname” parameter was added to allow QEMU to be informed of the hostname to use for x509 certificate validation when using a non-tcp migration protocol. This can be set on the source QEMU prior to starting the migration using the “
migrate_set_str_parameter” monitor command
(qemu) migrate_set_str_parameter tls-hostname myhost.mydomain
This feature has been under development for a while and finally merged into QEMU GIT early in the 2.7.0 development cycle, so will be available for use when 2.7.0 is released in a few weeks. With the arrival of the 2.7.0 release there will finally be TLS support across all QEMU host services where TCP connections are commonly used, namely VNC, SPICE, NBD, migration and character devices.
In this blog series:
This blog is part 5 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
For many years now QEMU has had code to support the NBD protocol, either as a client or as a server. The qemu-nbd command line tool can be used to export a disk image over NBD to a remote machine, or connect it directly to the local kernel’s NBD block device driver. The QEMU system emulators also have a block driver that acts as an NBD client, allowing VMs to be run from NBD volumes. More recently the QEMU system emulators gained the ability to export the disks from a running VM as named NBD volumes. The latter is particularly interesting because it is the foundation of live migration with block device replication, allowing VMs to be migrated even if you don’t have shared storage between the two hosts. In common with most network block device protocols, NBD has never offered any kind of data security capability. Administrators are recommended to run NBD over a private LAN/vLAN, use network layer security like IPSec, or tunnel it over some other kind of secure channel. While all these options are capable of working, none are very convenient to use because they require extra setup steps outside of the basic operation of the NBD server/clients. Libvirt has long had the ability to tunnel the QEMU migration channel over its own secure connection to the target host, but this has not been extended to cover the NBD channel(s) opened when doing block migration. While it could theoretically be extended to cover NBD, it would not be ideal from a performance POV because the libvirtd architecture means that the TLS encryption/decryption for multiple separate network connections would be handled by a single thread. For fast networks (10-GigE), libvirt will quickly become the bottleneck on performance even if the CPU has native support for AES.
Thus it was decided that the QEMU NBD client & server would need to be extended to support TLS encryption of the data channel natively. Initially the thought was to just add a flag to the client/server code to indicate that TLS was desired and run the TLS handshake before even starting the NBD protocol. After some discussion with the NBD maintainers though, it was decided to explicitly define a way to support TLS in the NBD protocol negotiation phase. The primary benefit of doing this is to allow clearer error reporting to the user if the client connects to a server requiring use of TLS and the client itself does not support TLS, or vica-verca – ie instead of just seeing what appears to be a mangled NBD handshake and not knowing what it means, the client can clearly report “This NBD server requires use of TLS encryption”.
The extension to the NBD protocol was fairly straightforward. After the initial NBD greeting (where the client & server agree the NBD protocol variant to be used) the client is able to request a number of protocol options. A new option was defined to allow the client to request TLS support. If the server agrees to use TLS, then they perform a standard TLS handshake and the rest of the NBD protocol carries on as normal. To prevent downgrade attacks, if the NBD server requires TLS and the client does not request the TLS option, then it will respond with an error and drop the client. In addition if the server requires TLS, then TLS must be the first option that the client requests – other options are only permitted once the TLS session is active & the server will again drop the client if it tries to request non-TLS options first.
The QEMU NBD implementation was originally using plain POSIX sockets APIs for all its I/O. So the first step in enabling TLS was to update the NBD code so that it used the new general purpose QEMU I/O channel APIs instead. With that done it was simply a matter of instantiating a new QIOChannelTLS object at the correct part of the protocol handshake and adding various command line options to the QEMU system emulator and qemu-nbd program to allow the user to turn on TLS and configure x509 certificates.
Running a NBD server using TLS can be done as follows:
$ qemu-nbd --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/qemutls \
--tls-creds tls0 /path/to/disk/image.qcow2
On the client host, a QEMU guest can then be launched, connecting to this NBD server:
$ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/qemutls \
-drive driver=nbd,host=theotherhost,port=10809,tls-creds=tls0 \
...other QEMU options...
Finally to enable support for live migration with block device replication, the QEMU system monitor APIs gained support for a new parameter when starting the internal NBD server. All of this code was merged in time for the forthcoming QEMU 2.6 release. Work has not yet started to enable TLS with NBD in libvirt, as there is little point securing the NBD protocol streams, until the primary live migration stream is using TLS. More on live migration in a future blog post, as that’s going to be QEMU 2.7 material now.
In this blog series: