Injecting fake keyboard events to KVM guests via libvirt

Posted: September 23rd, 2011 | Filed under: Fedora, Gtk-Vnc, libvirt, Virt Tools | 3 Comments »

I’ve written before about how virtualization causes pain wrt keyboard handling and about the huge number of scancode/keycode sets you have to worry about. Following on from that investigative work I completely rewrote GTK-VNC’s keycode handling, so it is able to correctly translate the keycodes it receives from GTK on Linux, Win32 and OS-X, even when running against a remote X11 server on a different platform. In doing so I made sure that the tables used for doing conversions between keycode sets were not just big arrays of magic numbers in the code, as is common practice across the kernel or QEMU codebase. Instead GTK-VNC now has a CSV file containing the unadulterated mapping data along with a simple script to split out mapping tables. This data file and script has already been reused to solve the same keycode mapping problem in SPICE-GTK.

Fast-forward a year and a libvirt developer from Fujitsu is working on a patch to wire up QEMU’s “sendkey” monitor command to a formal libvirt API. The first design question is how should the API accept the list of keys to be injected to the guest. The QEMU monitor command accepts a list of keycode names as strings, or as keycode values as hex-encoded strings. The QEMU keycode values come from what I term the “RFB” codeset, which is just the XT codeset with a slightly unusual encoding of extended keycodes. VirtualBox meanwhile has an API which wants integer keycode values, from the regular XT codeset.

One of the problems with the XT codeset is that no one can ever quite agree on what is the official way to encode extended keycodes, or whether it is even possible to encode certain types of key. There is also a usability problem with having the API require a lowlevel hardware oriented keycode set as input, in that as an application developer you might know what Win32 virtual keycode you want to generate, but have no idea what the corresponding XT keycode is. It would be preferable if you could simply directly inject a Win32 keycode to a Windows guest, or directly inject a Linux keycode to a Linux guest, etc.

After a little bit of discussion we came to the conclusion that the libvirt API should accept an array of integer keycodes, along with a enum parameter specifying what keycode set they belong to. Internally libvirt would then translate from whatever keycode set the application used, to the  keycode set required by the hypervisor’s own API. Thus we got an API that looks like:

typedef enum {
   VIR_KEYCODE_SET_LINUX          = 0,
   VIR_KEYCODE_SET_XT             = 1,
   VIR_KEYCODE_SET_ATSET1         = 2,
   VIR_KEYCODE_SET_ATSET2         = 3,
   VIR_KEYCODE_SET_ATSET3         = 4,
   VIR_KEYCODE_SET_OSX            = 5,
   VIR_KEYCODE_SET_XT_KBD         = 6,
   VIR_KEYCODE_SET_USB            = 7,
   VIR_KEYCODE_SET_WIN32          = 8,
   VIR_KEYCODE_SET_RFB          = 9,

   VIR_KEYCODE_SET_LAST,
} virKeycodeSet;

int virDomainSendKey(virDomainPtr domain,
                     unsigned int codeset,
                     unsigned int holdtime,
                     unsigned int *keycodes,
                     int nkeycodes,
                     unsigned int flags);

As with all libvirt APIs, this is also exposed in the virsh command line tool, via a new “send-key” command. As you might expect, this accepts a list of integer keycodes as parameters, along with a keycode set name. If the keycode set is omitted, we are assuming use of the Linux keycode set by default. To be slightly more user friendly though, for the Linux, Win32 & OS-X keycode sets, we also support symbolic keycode names as an alternative to the integer values. These names are simply the name of the #define constant from corresponding header file.

Some examples of how to use the new virsh command are

# send three strokes 'k', 'e', 'y', using xt codeset
virsh send-key dom --codeset xt 37 18 21

# send one stroke 'right-ctrl+C'
virsh send-key dom KEY_RIGHTCTRL KEY_C

# send a tab, held for 1 second
virsh send-key --holdtime 1000 0xf

So when interacting with virtual guests you now have a choice of how to send fake keycodes. If you have a VNC or SPICE connection directly to the guest in question, you can inject keycodes over that channel, while if you have a libvirt connection to the hypervisor you can inject keycodes over that channel.

An oddity delaying kernel shutdown

Posted: September 22nd, 2011 | Filed under: Fedora, libvirt, Virt Tools | 2 Comments »

A couple of years ago Dan Walsh introduced the SELinux sandbox which was a way to confine what resources an application can access using SELinux and the Linux filesystem namespace functionality.  Meanwhile we developed sVirt in libvirt to confine QEMU virutal machines, and QEMU itself has gained support for passing host filesystems straight through to the guest operating system, using a VirtIO based transport for the 9p filesystem. This got me thinking about whether it was now practical to create a sandbox based on QEMU, or rather KVM by booting a guest with a root filesystem pointing to the host’s root filesystem (readonly of course), combined with a couple of overlays for /tmp and /home, all protected by sVirt.

One prominent factor in the practicality is how much time the KVM and kernel startup sequences add to the overall execution time of the command being sandboxed. From Richard Jones‘ work on libguestfs I know that it is possible to boot to a functioning application inside KVM in < 5 seconds. The approach I take with 9pfs has a slight advantage over libguestfs because it does not occur the initial (one-time only per kernel version) delay for building a virtual appliance based on the host filesystem, since we’re able to direct access the host filesystem from the guest. The fine details will have to wait for a future blog post, but suffice to say, a stock Fedora kernel can be made to boot to the point of exec()ing the ‘init’ binary in the ramdsisk in ~0.6 seconds, and the custom ‘init’ binary I use for mounting the 9p filesystems takes another ~0.2 seconds, giving a total boot time of 0.8 seconds.

Boot up time, however, is only one side of the story. For some application sandboxing scenarios, the shutdown time might be just as important as startup time. I naively thought that the kernel shutdown time would be unmeasurably short. It turns out I was wrong, big time. Timestamps on the printk messages showed that the shutdown time was in fact longer than the bootup time ! The telling messages were:

[    1.486287] md: stopping all md devices.
[    2.492737] ACPI: Preparing to enter system sleep state S5
[    2.493129] Disabling non-boot CPUs ...
[    2.493129] Power down.
[    2.493129] acpi_power_off called

which point a finger towards the MD driver. I was sceptical that the MD driver could be to blame, since my virtual machine does not have any block devices at all, let alone MD devices. To be sure though, I took a look at the MD driver code to see just what it does during kernel shutdown. To my surprise the answer to blindly obvious:

static int md_notify_reboot(struct notifier_block *this, unsigned long code, void *x)
{
  struct list_head *tmp;
  mddev_t *mddev;

  if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) {

    printk(KERN_INFO "md: stopping all md devices.\n");

    for_each_mddev(mddev, tmp)
      if (mddev_trylock(mddev)) {
          /* Force a switch to readonly even array
           * appears to still be in use.  Hence
           * the '100'.
           */
          md_set_readonly(mddev, 100);
          mddev_unlock(mddev);
      }

    /*
     * certain more exotic SCSI devices are known to be
     * volatile wrt too early system reboots. While the
     * right place to handle this issue is the given
     * driver, we do want to have a safe RAID driver ...
     */
    mdelay(1000*1);
  }
  return NOTIFY_DONE;
}

In other words, regardless of whether you actually have any MD devices, it’ll impose a fixed 1 second delay into your shutdown sequence :-(

With this kernel bug fixed, the total time my KVM sandbox spends running the kernel is reduced by more than 50%, from 1.9s to 0.9s. The biggest delay is now down to Seabios & QEMU which together take 2s to get from the start of QEMU main(), to finally jumping into the kernel entry point.