ANNOUNCE: new libvirt console proxy project

Posted: January 26th, 2017 | Filed under: Fedora, libvirt, OpenStack, Security | Tags: , , , , | No Comments »

This post is to announce the existence of a new small project to provide a websockets proxy explicitly targeting virtual machines serial consoles and SPICE/VNC graphical displays.

Background

Virtual machines will typically expose one or more consoles for the user to interact on, whether plain text consoles or graphical consoles via VNC/SPICE. With KVM, the network server for these consoles is run as part of the QEMU process on the compute node. It has become common practice to not directly expose these network services to the end user. Instead large scale deployments will typically run some kind of proxy service sitting between the end user and their VM’s console(s). Typically this proxy will tunnel the connection over the websockets protocol too. This has a number of advantages to users and admins alike. By using websockets, it is possible to use an HTTP parameter or cookie to identify which VM to connect to. This means that a single public network port can multiplex access to all the virtual machines, dynamically figuring out which to connect to. The use of websockets also makes it possible to write in-browser clients (noVNC, SPICE-HTML5) to display the console, avoiding the need to install local desktop apps.

There are already quite a few implementations of the websockets proxy idea out there used by virt/cloud management apps, so one might wonder why another one is needed. The existing implementations generally all work at the TCP layer, meaning they treat the content being carried as opaque data. This means that while they provide security between the end user & proxy server by use of HTTPS for the websockets connection, the internal communication between the proxy and QEMU process is still typically running in cleartext. For SPICE and serial consoles it is possible to add security between the proxy and QEMU by simply layering TLS over the entire protocol, so again the data transported can be considered opaque. This isn’t possible for VNC though. To get encryption between the proxy server and QEMU requires interpreting the VNC protocol to intercept the authentication scheme negotiation, turning on TLS support. IOW, the proxy server cannot treat the VNC data stream as opaque.

The libvirt-console-proxy project was started specifically to address this requirement for VNC security. The proxy will interpret the VNC protocol handshake, responding to the client user’s auth request (which is for no auth at the VNC layer, since the websockets layer already handled auth from the client’s POV). The proxy will start a new auth scheme handshake with QEMU and request the VeNCrypt scheme be used, which enables x509 certificate mutual validation and TLS encryption of the data stream. Once the security handshake is complete, the proxy will allow the rest of the VNC protocol to run as a black-box once more – it simply adds TLS encryption when talking to QEMU. This VNC MITM support is already implemented in the libvirt-console-proxy code and enabled by default.

As noted earlier, SPICE is easier to deal with, since it has separate TCP ports for plain vs TLS traffic. The proxy simply needs to connect to the TLS port instead of the TCP port, and does not need to speak the SPICE protocol. There is, however, a different complication with SPICE. A logical SPICE connection from a client to the server actually consists of many network connections as SPICE uses a separate socket for each type of data (keyboard events, framebuffer updates, audio, usb tunnelling, etc). In fact there can be as many as 10 network connections for a single SPICE connection. The websockets proxy will normally expect some kind of secret token to be provided by the client both as authentication, and to identify which SPICE server to connect to. It can be highly desirable for such tokens to be single-use only. Since each SPICE connection requires multiple TCP connections to be established, the proxy has no way of knowing when it can invalidate further use of the token. To deal with this it needs to insert itself into the SPICE protocol negotiation. This allows it to see the protocol message from the server enumerating which channels are available, and also see the unique session identifier. When the additional SPICE network connections arrive, the proxy can now validate that the session ID matches the previously identified session ID, and reject re-use of the websockets security token if they don’t match. It can also permanently invalidate the security token, once all expected secondary connections are established. At time of writing, this SPICE MITM support is not yet implemented in the libvirt-console-proxy code, but is is planned.

Usage

The libvirt-console-proxy project is designed to be reusable by any virtualization management project and so is split into two parts. The general purpose “virtconsoleproxyd” daemon exposes the websockets server and actually runs the connections to the remote QEMU server(s). This daemon is designed to be secure by default, so it mandates configuration of x509 certificates for both the websockets server and the internal VNC, SPICE & serial console connections to QEMU. Running in cleartext requires explicit “--listen-insecure” and “--connect-insecure” flags to make it clear this is a stupid thing to be doing.

The “virtconsoleproxyd” still needs some way to identify which QEMU server(s) it should connect to for a given incoming websockets connection. To address this there is the concept of an external “token resolver” service. This is a daemon running somewhere (either co-located, or on a remote host), providing a trivial REST service. The proxy server does a GET to this external resolver service, passing along the token value. The resolver validates the token, determines what QEMU server it corresponds to and passes this info back to “virtconsoleproxyd“. This way, each project virt management project can deploy the “virtconsoleproxyd” service as is, and simply provide a custom “token resolver” service to integrate with whatever application specific logic is needed to identify QEMU.

There is also a reference implementation of the token resolver interface called “virtconsoleresolverd“. This resolver has a connection to one or more libvirtd daemons and monitors for running virtual machines on each one. When a virtual machine starts, it’ll query the libvirt XML for that machine and extract a specific piece of metadata from the XML under the namespace http://libvirt.org/schemas/console-proxy/1.0. This metadata identifies a libvirt secret object that provides the token for connecting to that guest console. As should be expected, the “virtconsoleresolverd” requires use of HTTPS by default, refusing to run cleartext unless the “--listen-insecure” option is given.

For example on a guest which has a SPICE graphics console, a serial ports and two virtio-console devices, the metadata would look like

$ virsh  metadata demo http://libvirt.org/schemas/console-proxy/1.0
<consoles>
  <console token="999f5742-2fb5-491c-832b-282b3afdfe0c" type="spice" port="0" insecure="no"/>
  <console token="6a92ef00-6f54-4c18-820d-2a2eaf9ac309" type="serial" port="0" insecure="no"/>
  <console token="3d7bbde9-b9eb-4548-a414-d17fa1968aae" type="console" port="0" insecure="no"/>
  <console token="393c6fdd-dbf7-4da9-9ea7-472d2f5ad34c" type="console" port="1" insecure="no"/>
</consoles>

Each of those tokens refers to a libvirt secret

$ virsh -c sys secret-dumpxml 999f5742-2fb5-491c-832b-282b3afdfe0c
<secret ephemeral='no' private='no'>
  <uuid>999f5742-2fb5-491c-832b-282b3afdfe0c</uuid>
  <description>Token for spice console proxy domain d54df46f-1ab5-4a22-8618-4560ef5fac2c</description>
</secret>

And that secret has a value associated which is the actual security token to be provided when connecting to the websockets proxy

$ virsh -c sys secret-get-value 999f5742-2fb5-491c-832b-282b3afdfe0c | base64 -d
750d78ed-8837-4e59-91ce-eef6800227fd

So in this example, to access the SPICE console, the remote client would provide the security token string “750d78ed-8837-4e59-91ce-eef6800227fd“. It happens that I’ve used UUIDs as the actual security token, but they can actually be in any format desired – random UUIDs just happen to be reasonably good high entropy secrets.

Setting up this metadata is a little bit tedious, so there is also a help program that can be used to automatically create the secrets, set a random value and write the metadata into the guest XML

$ virtconsoleresolveradm enable demo
Enabled access to domain 'demo'

Or to stop it again

$ virtconsoleresolveradm disable demo
Disabled access to domain 'demo'

If you noticed my previous announcements of libvirt-go and libvirt-go-xml it won’t come as a surprise to learn that this new project is also written in Go. In fact the integration of libvirt support in this console proxy project is what triggered my work on those other two projects. Having previously worked on adding the same kind of VNC MITM protocol support to the OpenStack nova-novncproxy server, I find I’m liking Go more and more compared to Python.

Improving QEMU security part 4: generic I/O channel framework to simplify TLS

Posted: April 4th, 2016 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , , | No Comments »

This blog is part 4 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.

Part 2 of this series described the creation of a general purpose API for simplifying TLS session handling inside QEMU, particularly with a view to hiding the complexity of the handshake and x509 certificate validation. The VNC server was converted to use this API, which was a big benefit, but there was still a need to add extra code to support TLS in the I/O paths. Specifically, anywhere that the VNC server would read/write on the network socket, had to be made TLS aware so that it would use plain POSIX send/recv functions vs the TLS wrapped send/recv functions as appropriate. For the VNC server it is actually even more complex, because it also supports websockets, so each I/O point had to choose between plain, TLS, websockets and websockets plus TLS.  As TLS support extends to other areas of QEMU this pattern would continue to complicate I/O paths in each backend.

Clearly there was a need for some form of I/O channel abstraction that would allow TLS to be enabled in each QEMU network backend without having to add conditional logic at every I/O send/recv call. Looking around at the QEMU subsystems that would ultimately need TLS support, showed a variety of approaches currently in use

  • Character devices use combination of POSIX sockets APIs to establish connections and GIOChannel for performing I/O on them
  • Migration has a QEMUFile abstraction which provides read/write facilities for a number of underlying transports, TCP sockets, UNIX sockets, STDIO, external command, in memory buffer and RDMA. The various QEMUFile impls all uses the plain POSIX sockets APIs and for TCP/UNIX sockets the sendmsg/recvmsg functions for I/O
  • NBD client & server use plain POSIX sockets APIs and sendmsg/recvmsg for I/O
  • VNC server uses plain POSIX sockets APIs and sendmsg/recvmsg for I/O

The GIOChannel APIs used by the character device backend theoretically provide an extensible framework for I/O and there is even a TLS implementation of the GIOChannel API. The two limitations of GIOChannel for QEMU though are that it does not support scatter / gather / vectored I/O APIs and that it does not support file descriptor passing over UNIX sockets. The latter is not a show stopper, since you can still access the socket handle directly to send/recv file descriptors. The lack of vectored I/O though would be a significant issue for migration and NBD servers where performance is very important. While we could potentially extend GIOChannel to add support for new callbacks to do vectored I/O, by the time you’ve done that most of the original GIOChannel code isn’t going to be used, limiting the benefit of starting from GIOChannel as a base. It is also clear that GIOChannel is really not something that is going to get any further development from the GLib maintainers, since their focus is on the new and much better GIO library. This supports file descriptor passing and TLS encryption, but again lacks support for vectored I/O. The bigger show stopper though is that to get access to the TLS support requires depending on a version on GLib that is much newer than what QEMU is willing to use. The existing QEMUFile APIs could form the basis of a general purpose I/O channel system if they were untangled & extracted from migration codebase. One limitation is that QEMUFile only concerns itself with I/O, not the initial channel establishment which is left to the migration core code to deal with, so did not actually provide very much of a foundation on which to build.

After looking through the various approaches in use in QEMU, and potentially available from GLib, it was decided that QEMU would be best served by creating a new general purpose I/O channel API. Thus a new QEMU subsystem was added in the io/ and include/io/ directories to provide a set of classes for I/O over a variety of different data channels. The core design aims were to use the QEMU object model (QOM) framework to provide a standard pattern for extending / subclassing, use the QEMU Error object for all error reporting, file  descriptor passing, main loop watch integration and coroutine integration. Overall the new design took many elements of its design from GIOChannel and the GIO library, and blended them with QEMU’s own codebase design. The initial goal was to provide enough functionality to convert the VNC server as a proof of concept. To this end the following classes were created

  • QIOChannel – the abstract base defining the overall interface for the I/O framework
  • QIOChannelSocket – implementation targeting TCP, UDP and UNIX sockets
  • QIOChannelTLS – layer that can provide a TLS session over any other channel
  • QIOChannelWebsock – layer that can run the websockets protocol over any other channel

To avoid making this blog posting even larger, I won’t go into details of these (the code is available in QEMU git for anyone who’s really interesting), but instead illustrate it with a comparison of the VNC code before & after. First consider the original code in the VNC server for dealing with writing a buffer of data over a plain socket or websocket either with TLS enabled. The following functions existed in the VNC server code to handle all the combinations:

ssize_t vnc_tls_push(const char *buf, size_t len, void *opaque)
{
    VncState *vs = opaque;
    ssize_t ret;

 retry:
    ret = send(vs->csock, buf, len, 0);
    if (ret < 0) {
        if (errno == EINTR) {
            goto retry;
        }
        return -1;
    }
    return ret;
}

ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
    ssize_t ret;
    int err = 0;
    if (vs->tls) {
        ret = qcrypto_tls_session_write(vs->tls, (const char *)data, datalen);
        if (ret < 0) {
            err = errno;
        }
    } else {
        ret = send(vs->csock, (const void *)data, datalen, 0);
        if (ret < 0) {
            err = socket_error();
        }
    }
    return vnc_client_io_error(vs, ret, err);
}

long vnc_client_write_ws(VncState *vs)
{
    long ret;
    vncws_encode_frame(&vs->ws_output, vs->output.buffer, vs->output.offset);
    buffer_reset(&vs->output);
    return vnc_client_write_buf(vs, vs->ws_output.buffer, vs->ws_output.offset);
}

static void vnc_client_write_locked(void *opaque)
{
    VncState *vs = opaque;

    if (vs->encode_ws) {
        vnc_client_write_ws(vs);
    } else {
        vnc_client_write_plain(vs);
    }
}

After conversion to use the new QIOChannel classes for sockets, websockets and TLS, all of the VNC server code above turned into

ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
    Error *err = NULL;
    ssize_t ret;
    ret = qio_channel_write(vs->ioc, (const char *)data, datalen, &err);
    return vnc_client_io_error(vs, ret, &err);
}

It is clearly a major win for maintainability of the VNC server code to have all the TLS and websockets I/O support handled by the QIOChannel APIs. There is no impact to supporting TLS and websockets anywhere in the VNC server I/O paths now. The only place where there is new code is the point where the TLS or websockets session is initiated and this now only requires instantiation of a suitable QIOChannel subclass and registering a callback to be run when the session handshake completes (or fails).

tls = qio_channel_tls_new_server(vs->ioc, vs->vd->tlscreds, vs->vd->tlsaclname, &err);
if (!tls) {
    vnc_client_error(vs);
    return 0;
}

object_unref(OBJECT(vs->ioc));
vs->ioc = QIO_CHANNEL(tls);

qio_channel_tls_handshake(tls, vnc_tls_handshake_done, vs, NULL);

Notice that the code is simply replacing the current QIOChannel handle ‘vs->ioc’ with an instance of the QIOChannelTLS class. The vnc_tls_handshake_done method is invoked when the TLS handshake is complete or failed and lets the VNC server continue with the next part of its authentication protocol, or drop the client connection as appropriate. So adding TLS session support to the VNC server comes in at about 10 lines of code now.

In this blog series:

More than you (or I) ever wanted to know about virtual keyboard handling

Posted: July 4th, 2010 | Filed under: Gtk-Vnc, Virt Tools | Tags: , , , , , , , | 6 Comments »

As a general rule, people using virtual machines have only one requirement when it comes to keyboard handling: any key they press should generate the same output in the host OS and the guest OS. Unfortunately, this is a surprisingly difficult requirement to satisfy. For a long time when using either Xen or KVM, to get workable keyboard handling it was necessary to configure the keymap in three places, 1. the VNC client OS, 2. the guest OS, 3. QEMU itself. The third item was a particular pain because it meant that a regular guest OS user would need administrative access to the host to change the keymap of their guest. Not good for delegation of control. In Fedora 11 we introduced a special VNC extension which allowed us to remove the need to configure keymaps in QEMU, so now it is merely necessary to configure the guest OS to match the client OS. One day when we get a general purpose guest OS agent, we might be able to automatically set the guest OS keymap to match client OS, whenever connecting via VNC, removing the last manual step. This post aims to give background on how keyboards work, what we done in VNC to improve the guest keyboard handling and what problems we still have.

Keyboard hardware scan codes

Between the time of pressing the physical key and text appearing on the screen, there are several steps in processing the input with data conversions along the way. The keyboard generates what are known as scan codes, a sequence of one or more bytes, which uniquely identifies the physical key and whether it is a make or break (press or release) event. Scan codes are invariant across all keyboards of the same type, regardless of what label is printed on the key. In the PC world, IBM defined the first scan code set with their IBM XT, followed later by AT scan codes and PS/2 scan codes. Other manufacturers adopted the IBM scan codes for their own products to ensure they worked out of the box. In the USB world, the HID specification defines the standard scan codes that manufacturers must use.

Operating system key codes

For operating systems wanted to support more than one different type of keyboard, scan codes are not a particularly good representation. They are also often unwieldly as a result of encoding both the key & its make/break state into the same byte(s). Thus operating systems typically define their own standard set of key codes, which is able to represent any possible keys on all known keyboards. They will also track the make/break state separately, now using press/release or up/down as terminology. Thus the first task of the keyboard driver is to convert from the hardware specific scan codes to the operating system specific key code. This is an easily reverseable, lossless mapping.

Display / toolkit key symbols & modifiers

Key codes still aren’t a concept that is particularly useful for (most) applications, which would rather known what user’s intended symbol was, rather than the physical key. Thus the display service (X11, Win32 GUI, etc) or application toolkit (GTK) define what are known as key symbols. To convert from key codes to key symbols, a key map is required for the language specific keyboard layout. The key map declares modifier keys (shift, alt, control, etc) and provides a list of key symbols that are associated with each key code. This mapping is only reverseable if you know the original key map. This is also a lossy mapping, because it is possible for several different key codes to map to the same key symbol.

Considering an end-to-end example, starting with the user pressing the key in row 4, column 2 which is labelled ‘A’ in a US layout, XT compatible keyboard. The operating system keyboard driver receives XT scan code 0x1e, which it converts to Linux key code 30 (KEY_A), Xorg server keyboard driver further converts to X11 key symbol 0x0061 (XK_a), GTK toolkit converts this to GDK key symbol 0x0061 (GDK_a), and finally the application displays the character ‘a’ in the text entry field on screen. There are actually a couple of conversions I’ve left out here, specifically how X11 gets the key codes from the OS and how X11 handles key codes internally, which I’ll come back to another time.

The problem with virtualization

For 99.99% of applications all these different steps / conversions are no problem at all, because they are only interested in text entry. Virtualization, as ever, introduces fun new problems where ever it goes. The problem occurs at the interface between the host virtual desktop client and the hardware emulation. The virtual desktop may be a local fat client using a toolkit like SDL, or it may be a remote network client using a protocol like VNC, RFB or SPICE. In both cases, a naive virtual desktop client will be getting key symbols with their key events. The hardware emulation layer will usually want to provide something like a virtualizated PS/2 or USB keyboard. This implies that there needs to be a conversion from key symbols back to hardware specific scan codes. Remember a couple of paragraphs ago where it was noted that the key code -> key symbol conversion is lossy. That is a now a big problem. In the case of a network client it is even worse, because the virtualization host does not even know what language specific keymap was used.

Faced with these obstacles, the initial approach QEMU took was to just add a command line parameter ‘-k $KEYMAP’. Without this parameter set, it will assume the virtuall desktop client is using a US layout, otherwise it will use the specified keymap. There is still the problem that many key codes can map to the same key symbol. It is impossible to get around this problem – QEMU just has to pick one of the many possible reverse mappings & use it. This means hat, even if the user configures matching keymaps on their client, QEMU and the guest OS, there may be certain keys that will never work in the guest OS. There is also a burden for QEMU to maintain a set of keymaps for every language layout. This set is inevitably incomplete.

The solution with virtualization

Fortunately, there is a get out of jail free card available. When passing key events to an application, all graphical windowing systems will provide both the key symbol and layout independent key code. Remember from many paragraphs earlier that scan code -> key code conversion is lossless and easily reverseable. For local fat clients, the only stumbling block is knowing what key code set is in use. Windows and OS-X both define a standard virtual key set, but sadly X11 does not :-( Instead the key codes are specific to the X11 keyboard driver that is in use, with a Linux based Xorg this is typically ‘kbd’ or ‘evdev’. QEMU has to use heuristics on X11 to decide which key codes it is receiving, but at least once this is identified, the conversion back to scan codes is trivial.

For the VNC network client though, there was one additional stumbling block. The RFB protocol encoding of key events only includes the key symbol, not the key code. So even though the VNC client has all the information the VNC server needs, there is no way to send it. Leading upto the development of Fedora 11, the upstream GTK-VNC and QEMU communities collaborated to define an official extension to the RFB protocol for an extended key event, that includes both key symbol and key code. Since every windowing system has its own set of key codes & the RFB needs to be platform independent, the protocol extension defined that the keycode set on the wire will be a special 32-bit encoding of the traditional XT scancodes. It is not a coincidence that QEMU already uses a 32-bit encoding of traditional XT scan codes internally :-) With this in place the RFB client, merely has to identify what operating system specific key codes it is receiving and then apply the suitable lossless mapping back to XT scancodes.

Considering an end-to-end example, starting with the user pressing the key in row 4, column 2 which is labelled ‘A’ in a US layout, XT compatible keyboard. The operating system keyboard driver receives XT scan code 0x1e, which it converts to Linux key code 30 (KEY_A), Xorg server keyboard driver further converts to X11 key symbol 0x0061 (XK_a) and an evdev key code, GTK toolkit converts the key symbol to GDK key symbol 0x0061 (GDK_a) but passes the evdev key code unchanged. The VNC client converts the evdev key code to the RFB key code and sends it to the QEMU VNC server along with the key symbol. The QEMU VNC server totally ignores the key symbol and does the no-op conversion from the RFB key code to the QEMU key code. The QEMU PS/2 emulation converts the QEMU keycode to either the XT scan code, AT scan code or PS/2 scan code, depending on how the guest OS has configured the keyboard controller. Finally the conversion mentioned much earlier takes place in the guest OS and the letter ‘a’ appears in a text field in the guest. The important bit to note is that although the key symbol was present, it was never used in the host OS or remote network client at any point. The only conversions performed were between scan codes and key codes all of which were lossless. The user is able to generate all the same key sequences in the guest OS as they can in the host OS. The user is happy their keyboard works as expected; the virtualization developer is happy at lack of bug reports about broken keyboard handling.