ANNOUNCE: new libvirt console proxy project

Posted: January 26th, 2017 | Filed under: Fedora, libvirt, OpenStack, Security | Tags: , , , , | No Comments »

This post is to announce the existence of a new small project to provide a websockets proxy explicitly targeting virtual machines serial consoles and SPICE/VNC graphical displays.

Background

Virtual machines will typically expose one or more consoles for the user to interact on, whether plain text consoles or graphical consoles via VNC/SPICE. With KVM, the network server for these consoles is run as part of the QEMU process on the compute node. It has become common practice to not directly expose these network services to the end user. Instead large scale deployments will typically run some kind of proxy service sitting between the end user and their VM’s console(s). Typically this proxy will tunnel the connection over the websockets protocol too. This has a number of advantages to users and admins alike. By using websockets, it is possible to use an HTTP parameter or cookie to identify which VM to connect to. This means that a single public network port can multiplex access to all the virtual machines, dynamically figuring out which to connect to. The use of websockets also makes it possible to write in-browser clients (noVNC, SPICE-HTML5) to display the console, avoiding the need to install local desktop apps.

There are already quite a few implementations of the websockets proxy idea out there used by virt/cloud management apps, so one might wonder why another one is needed. The existing implementations generally all work at the TCP layer, meaning they treat the content being carried as opaque data. This means that while they provide security between the end user & proxy server by use of HTTPS for the websockets connection, the internal communication between the proxy and QEMU process is still typically running in cleartext. For SPICE and serial consoles it is possible to add security between the proxy and QEMU by simply layering TLS over the entire protocol, so again the data transported can be considered opaque. This isn’t possible for VNC though. To get encryption between the proxy server and QEMU requires interpreting the VNC protocol to intercept the authentication scheme negotiation, turning on TLS support. IOW, the proxy server cannot treat the VNC data stream as opaque.

The libvirt-console-proxy project was started specifically to address this requirement for VNC security. The proxy will interpret the VNC protocol handshake, responding to the client user’s auth request (which is for no auth at the VNC layer, since the websockets layer already handled auth from the client’s POV). The proxy will start a new auth scheme handshake with QEMU and request the VeNCrypt scheme be used, which enables x509 certificate mutual validation and TLS encryption of the data stream. Once the security handshake is complete, the proxy will allow the rest of the VNC protocol to run as a black-box once more – it simply adds TLS encryption when talking to QEMU. This VNC MITM support is already implemented in the libvirt-console-proxy code and enabled by default.

As noted earlier, SPICE is easier to deal with, since it has separate TCP ports for plain vs TLS traffic. The proxy simply needs to connect to the TLS port instead of the TCP port, and does not need to speak the SPICE protocol. There is, however, a different complication with SPICE. A logical SPICE connection from a client to the server actually consists of many network connections as SPICE uses a separate socket for each type of data (keyboard events, framebuffer updates, audio, usb tunnelling, etc). In fact there can be as many as 10 network connections for a single SPICE connection. The websockets proxy will normally expect some kind of secret token to be provided by the client both as authentication, and to identify which SPICE server to connect to. It can be highly desirable for such tokens to be single-use only. Since each SPICE connection requires multiple TCP connections to be established, the proxy has no way of knowing when it can invalidate further use of the token. To deal with this it needs to insert itself into the SPICE protocol negotiation. This allows it to see the protocol message from the server enumerating which channels are available, and also see the unique session identifier. When the additional SPICE network connections arrive, the proxy can now validate that the session ID matches the previously identified session ID, and reject re-use of the websockets security token if they don’t match. It can also permanently invalidate the security token, once all expected secondary connections are established. At time of writing, this SPICE MITM support is not yet implemented in the libvirt-console-proxy code, but is is planned.

Usage

The libvirt-console-proxy project is designed to be reusable by any virtualization management project and so is split into two parts. The general purpose “virtconsoleproxyd” daemon exposes the websockets server and actually runs the connections to the remote QEMU server(s). This daemon is designed to be secure by default, so it mandates configuration of x509 certificates for both the websockets server and the internal VNC, SPICE & serial console connections to QEMU. Running in cleartext requires explicit “--listen-insecure” and “--connect-insecure” flags to make it clear this is a stupid thing to be doing.

The “virtconsoleproxyd” still needs some way to identify which QEMU server(s) it should connect to for a given incoming websockets connection. To address this there is the concept of an external “token resolver” service. This is a daemon running somewhere (either co-located, or on a remote host), providing a trivial REST service. The proxy server does a GET to this external resolver service, passing along the token value. The resolver validates the token, determines what QEMU server it corresponds to and passes this info back to “virtconsoleproxyd“. This way, each project virt management project can deploy the “virtconsoleproxyd” service as is, and simply provide a custom “token resolver” service to integrate with whatever application specific logic is needed to identify QEMU.

There is also a reference implementation of the token resolver interface called “virtconsoleresolverd“. This resolver has a connection to one or more libvirtd daemons and monitors for running virtual machines on each one. When a virtual machine starts, it’ll query the libvirt XML for that machine and extract a specific piece of metadata from the XML under the namespace http://libvirt.org/schemas/console-proxy/1.0. This metadata identifies a libvirt secret object that provides the token for connecting to that guest console. As should be expected, the “virtconsoleresolverd” requires use of HTTPS by default, refusing to run cleartext unless the “--listen-insecure” option is given.

For example on a guest which has a SPICE graphics console, a serial ports and two virtio-console devices, the metadata would look like

$ virsh  metadata demo http://libvirt.org/schemas/console-proxy/1.0
<consoles>
  <console token="999f5742-2fb5-491c-832b-282b3afdfe0c" type="spice" port="0" insecure="no"/>
  <console token="6a92ef00-6f54-4c18-820d-2a2eaf9ac309" type="serial" port="0" insecure="no"/>
  <console token="3d7bbde9-b9eb-4548-a414-d17fa1968aae" type="console" port="0" insecure="no"/>
  <console token="393c6fdd-dbf7-4da9-9ea7-472d2f5ad34c" type="console" port="1" insecure="no"/>
</consoles>

Each of those tokens refers to a libvirt secret

$ virsh -c sys secret-dumpxml 999f5742-2fb5-491c-832b-282b3afdfe0c
<secret ephemeral='no' private='no'>
  <uuid>999f5742-2fb5-491c-832b-282b3afdfe0c</uuid>
  <description>Token for spice console proxy domain d54df46f-1ab5-4a22-8618-4560ef5fac2c</description>
</secret>

And that secret has a value associated which is the actual security token to be provided when connecting to the websockets proxy

$ virsh -c sys secret-get-value 999f5742-2fb5-491c-832b-282b3afdfe0c | base64 -d
750d78ed-8837-4e59-91ce-eef6800227fd

So in this example, to access the SPICE console, the remote client would provide the security token string “750d78ed-8837-4e59-91ce-eef6800227fd“. It happens that I’ve used UUIDs as the actual security token, but they can actually be in any format desired – random UUIDs just happen to be reasonably good high entropy secrets.

Setting up this metadata is a little bit tedious, so there is also a help program that can be used to automatically create the secrets, set a random value and write the metadata into the guest XML

$ virtconsoleresolveradm enable demo
Enabled access to domain 'demo'

Or to stop it again

$ virtconsoleresolveradm disable demo
Disabled access to domain 'demo'

If you noticed my previous announcements of libvirt-go and libvirt-go-xml it won’t come as a surprise to learn that this new project is also written in Go. In fact the integration of libvirt support in this console proxy project is what triggered my work on those other two projects. Having previously worked on adding the same kind of VNC MITM protocol support to the OpenStack nova-novncproxy server, I find I’m liking Go more and more compared to Python.

Improving QEMU security part 4: generic I/O channel framework to simplify TLS

Posted: April 4th, 2016 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , , | No Comments »

This blog is part 4 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.

Part 2 of this series described the creation of a general purpose API for simplifying TLS session handling inside QEMU, particularly with a view to hiding the complexity of the handshake and x509 certificate validation. The VNC server was converted to use this API, which was a big benefit, but there was still a need to add extra code to support TLS in the I/O paths. Specifically, anywhere that the VNC server would read/write on the network socket, had to be made TLS aware so that it would use plain POSIX send/recv functions vs the TLS wrapped send/recv functions as appropriate. For the VNC server it is actually even more complex, because it also supports websockets, so each I/O point had to choose between plain, TLS, websockets and websockets plus TLS.  As TLS support extends to other areas of QEMU this pattern would continue to complicate I/O paths in each backend.

Clearly there was a need for some form of I/O channel abstraction that would allow TLS to be enabled in each QEMU network backend without having to add conditional logic at every I/O send/recv call. Looking around at the QEMU subsystems that would ultimately need TLS support, showed a variety of approaches currently in use

  • Character devices use combination of POSIX sockets APIs to establish connections and GIOChannel for performing I/O on them
  • Migration has a QEMUFile abstraction which provides read/write facilities for a number of underlying transports, TCP sockets, UNIX sockets, STDIO, external command, in memory buffer and RDMA. The various QEMUFile impls all uses the plain POSIX sockets APIs and for TCP/UNIX sockets the sendmsg/recvmsg functions for I/O
  • NBD client & server use plain POSIX sockets APIs and sendmsg/recvmsg for I/O
  • VNC server uses plain POSIX sockets APIs and sendmsg/recvmsg for I/O

The GIOChannel APIs used by the character device backend theoretically provide an extensible framework for I/O and there is even a TLS implementation of the GIOChannel API. The two limitations of GIOChannel for QEMU though are that it does not support scatter / gather / vectored I/O APIs and that it does not support file descriptor passing over UNIX sockets. The latter is not a show stopper, since you can still access the socket handle directly to send/recv file descriptors. The lack of vectored I/O though would be a significant issue for migration and NBD servers where performance is very important. While we could potentially extend GIOChannel to add support for new callbacks to do vectored I/O, by the time you’ve done that most of the original GIOChannel code isn’t going to be used, limiting the benefit of starting from GIOChannel as a base. It is also clear that GIOChannel is really not something that is going to get any further development from the GLib maintainers, since their focus is on the new and much better GIO library. This supports file descriptor passing and TLS encryption, but again lacks support for vectored I/O. The bigger show stopper though is that to get access to the TLS support requires depending on a version on GLib that is much newer than what QEMU is willing to use. The existing QEMUFile APIs could form the basis of a general purpose I/O channel system if they were untangled & extracted from migration codebase. One limitation is that QEMUFile only concerns itself with I/O, not the initial channel establishment which is left to the migration core code to deal with, so did not actually provide very much of a foundation on which to build.

After looking through the various approaches in use in QEMU, and potentially available from GLib, it was decided that QEMU would be best served by creating a new general purpose I/O channel API. Thus a new QEMU subsystem was added in the io/ and include/io/ directories to provide a set of classes for I/O over a variety of different data channels. The core design aims were to use the QEMU object model (QOM) framework to provide a standard pattern for extending / subclassing, use the QEMU Error object for all error reporting, file  descriptor passing, main loop watch integration and coroutine integration. Overall the new design took many elements of its design from GIOChannel and the GIO library, and blended them with QEMU’s own codebase design. The initial goal was to provide enough functionality to convert the VNC server as a proof of concept. To this end the following classes were created

  • QIOChannel – the abstract base defining the overall interface for the I/O framework
  • QIOChannelSocket – implementation targeting TCP, UDP and UNIX sockets
  • QIOChannelTLS – layer that can provide a TLS session over any other channel
  • QIOChannelWebsock – layer that can run the websockets protocol over any other channel

To avoid making this blog posting even larger, I won’t go into details of these (the code is available in QEMU git for anyone who’s really interesting), but instead illustrate it with a comparison of the VNC code before & after. First consider the original code in the VNC server for dealing with writing a buffer of data over a plain socket or websocket either with TLS enabled. The following functions existed in the VNC server code to handle all the combinations:

ssize_t vnc_tls_push(const char *buf, size_t len, void *opaque)
{
    VncState *vs = opaque;
    ssize_t ret;

 retry:
    ret = send(vs->csock, buf, len, 0);
    if (ret < 0) {
        if (errno == EINTR) {
            goto retry;
        }
        return -1;
    }
    return ret;
}

ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
    ssize_t ret;
    int err = 0;
    if (vs->tls) {
        ret = qcrypto_tls_session_write(vs->tls, (const char *)data, datalen);
        if (ret < 0) {
            err = errno;
        }
    } else {
        ret = send(vs->csock, (const void *)data, datalen, 0);
        if (ret < 0) {
            err = socket_error();
        }
    }
    return vnc_client_io_error(vs, ret, err);
}

long vnc_client_write_ws(VncState *vs)
{
    long ret;
    vncws_encode_frame(&vs->ws_output, vs->output.buffer, vs->output.offset);
    buffer_reset(&vs->output);
    return vnc_client_write_buf(vs, vs->ws_output.buffer, vs->ws_output.offset);
}

static void vnc_client_write_locked(void *opaque)
{
    VncState *vs = opaque;

    if (vs->encode_ws) {
        vnc_client_write_ws(vs);
    } else {
        vnc_client_write_plain(vs);
    }
}

After conversion to use the new QIOChannel classes for sockets, websockets and TLS, all of the VNC server code above turned into

ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
    Error *err = NULL;
    ssize_t ret;
    ret = qio_channel_write(vs->ioc, (const char *)data, datalen, &err);
    return vnc_client_io_error(vs, ret, &err);
}

It is clearly a major win for maintainability of the VNC server code to have all the TLS and websockets I/O support handled by the QIOChannel APIs. There is no impact to supporting TLS and websockets anywhere in the VNC server I/O paths now. The only place where there is new code is the point where the TLS or websockets session is initiated and this now only requires instantiation of a suitable QIOChannel subclass and registering a callback to be run when the session handshake completes (or fails).

tls = qio_channel_tls_new_server(vs->ioc, vs->vd->tlscreds, vs->vd->tlsaclname, &err);
if (!tls) {
    vnc_client_error(vs);
    return 0;
}

object_unref(OBJECT(vs->ioc));
vs->ioc = QIO_CHANNEL(tls);

qio_channel_tls_handshake(tls, vnc_tls_handshake_done, vs, NULL);

Notice that the code is simply replacing the current QIOChannel handle ‘vs->ioc’ with an instance of the QIOChannelTLS class. The vnc_tls_handshake_done method is invoked when the TLS handshake is complete or failed and lets the VNC server continue with the next part of its authentication protocol, or drop the client connection as appropriate. So adding TLS session support to the VNC server comes in at about 10 lines of code now.

In this blog series: