Announce: Release of Entangle v0.2.0 – An app for tethered camera control & capture

Posted: September 18th, 2010 | Filed under: Entangle | 2 Comments »

A few months on from the first release, I’ve polished off the latest changes in Entangle and pushed out a new release version 0.2.0. This is primarily a bug fix and code maintenance release, with the following important changes

  • Better compatibility with cameras which don’t support event notifications, allowing use of capture functionality
  • Rewrote the preview mode to actually be useful, providing a continuously updating live view until ready to capture
  • Directly offers to unmount the camera if it is locked by the GVFS gphoto2 plugin preventing its use for capture.
  • Replaces custom plugin code with use of libpeas.
  • Vastly improved the error reporting and logging to make diagnosis of camera problems easier
  • Fixed a nasty infinite loop in the camera capture code where a camera event could keep resetting the event timeout
  • Fixed a crash in accessing udev properties

It is already built for Fedora 14 and on its way into updates-testing, Fedora 15 will have to wait due to broken build roots caused by mutually conflicting upstart/systemd packages:-(  Alternatively grab the source code from the download area. For the next release I’m aiming to improve the full screen mode to give access to some of the capture & config controls, and generally improve the UI for camera configuration controls.

A summary of scan code & key codes sets used in the PC virtualization stack

Posted: July 4th, 2010 | Filed under: Gtk-Vnc, Virt Tools | Tags: , | 1 Comment »

In learning about virtual keyboard handling for my previous post, I spent alot of time researching just what scan code & key code sets are used at each point in the stack. I trawled across many websites and source code trees to discover this information, so I figure I should write it down in case I forget it all in 12 months time when I next get a keyboard handling bug report. I can’t guarantee I’ve got all the details here correct, so if you  spot mistakes, leave a comment.

IBM XT keyboard

In the beginning the PC was invented. This has made a lot of people very angry and has been widely regarded as a bad move. The IBM XT scan codes are first set encountered. Most of the scan codes were a  just a single byte, with the high bit clear for a make event and set for a break event. The keycodes don’t correspond at all to ASCII values, rather they were assigned incrementally across rows. As an example, the key in the 4th row, second column (labelled A on a us layout) has value 0x1e for make code and 0x9e for break code.  Keys on the numeric keypad would generate multi-byte sequences, where the leading byte is “0xe0” (sometimes referred to as the grey code, I guess because the keypad was a darker grey plastic)

IBM AT & PS/2 keyboards

The IBM AT keyboard was incompatible with the XT at a wire protocol level. In terms of scan codes though, it can support upto 3 sets. The first set is the original XT scan codes. The second set is a new AT scan codes. The third set is so called PS/2 scan codes. In all cases though, the second set is the default and the first & third sets may not even be supported. There is no resemblance between AT and XT scan codes. The use of the high bit for break code was abandoned. The same scan code is used by make and break, but in the break case it is prefixed by a byte 0xf0. As an example, the key in the 4th row, second column (labelled A on a us layout) has the value 0x1c. Again the keys on the numeric keyboard generated multi-byte sequences with a leading “0xe0”.

For added fun, regardless of what scan code set the keyboard is generating the i8042 keyboard controller will, by default, provide the OS with XT scan codes unless told otherwise. This was for backwards compatible with software which only knows about the previous XT scan code set.

Over time Microsoft introduced the Windows keys, then Internet keys arrived followed by all sorts of other function keys on laptops. The extended 0xe0 (“grey code”) sequence space was used for these new scan codes. Of course these keys needed scan codes in both the XT and AT sets to be defined. So for example, the left Windows key got the XT make/break scan codes 0xe0+0x5c and 0xe0+0xdc and the AT make/break scan codes 0xe0+0x27 and 0xe0+0xf0+0x27.

USB keyboards

The introduction of USB input devices, allowed for a clean break with the past, so XT/AT scan codes are no more in the USB world. The USB HID spec defines a standard 16 bit scan code set for all compliant keyboards to use. The USB HID spec gives scan codes names based on the US ASCII key cap labels. Thus the US ASCII key labelled ‘A’ has HID page ID 0x07 and HID usage ID 0x04. Where both the XT/AT sets and USB HID sets support a common key it is possible to perform a lossless mapping in either direction. There are, however, some keys in the USB HID scan code set for which there isn’t a XT/AT scan code.

Linux internals

The Linux internal event subsystem has defined a standard set of key codes that are hardware independant, able to represent any scan code from any type of keyboard whether AT, XT or USB. There are names assigned to the key codes based on the common US ASCII key cap labels. The key codes are defined in /usr/include/linux/input.h. For an example ‘#define KEY_A 30’. For convenience, key codes 0-88 map directly to XT scan codes 0-88. All the low level hardware drivers for AT, XT, USB keyboards, etc have to translate from scan codes to the Linux key codes when queueing key events for dispatch by the input subsystem. In “struct input_event”, the “code” field contains the key code while the “value” field indicates the state (press/release). Mouse buttons are also represented as key codes in this set.

Linux console

The Linux console (eg /dev/console or /dev/tty*) can operate in several different modes.

  • RAW: turns linux keycodes back into XT scancodes
  • MEDIUMRAW: an encoding of the linux key codes. key codes < 127 generated directly, with high bit set for a release event. key codes >= 127 are generated as multi-byte sequences. The leading byte is 0x0 or 0x1 depending on whether a key press or release. The second byte contains the low 7 bits and third byte contains the high 7 bits. Both second and third bytes always have the 8th bit set. This allows room for future expansion upto 16384 different key codes
  • ASCII: the ascii value associated with the key code + current modifiers, according to the loaded keymap
  • UNICODE: the UTF-8 byte sequence associated with the key code + current modifiers, according to the loaded keymap

The state of the current console can be seen and changed using the “kbd_mode” command line tool. Changing the mode is not recommended except to switch between ASCII & UNICODE modes

Linux evdev

The evdev driver is a new way of exposing input events to userspace, bypassing the traditional console/tty devices. The “struct input_event” data is provided directly to userspace without any magic encodings. Thus apps using evdev will always be operating with the Linux key code set initially. They can of course convert this into the XT scan codes if desired, in exactly the same way that the console device already does when in RAW mode.

Xorg XKB

In modern Xorg servers, XKB is in charge of all keyboard event handling. Unusually, XKB does not actually define any standard key code set. At least not numerically. In the XKB configuration files there are logical names for every key, based on their position. Some names may be obvious ‘<TAB>’, while others are purely reflecting a physical row/column location ‘<AE01>’. Each Xorg keyboard driver is free to define whatever key code set it desires. The driver must provide a mapping from its key code values to the XKB key code names (in /usr/share/X11/xkb/keycodes/). There is then a mapping from XKB key code names to key symbols (in /usr/share/X11/xkb/keysyms). The problem with this is that the key code seen in the XKeyEvent can come from an arbitrary set of which the application is not explicitly aware. Three common key code sets from Xorg will now be mentioned.

Xorg kbd driver

The traditional “kbd” driver uses the traditional Linux console/tty devices for receiving input, configuring the devices in RAW mode. Thus it is accepting XT scan codes initially. The kdb keycodes are defined in its source code at src/atKeynames.h  For scan codes below 89, a key code is formed simply by adding 8. Scan codes above that have a set of re-mapping rules that can’t be quickly illustrated here. The mapping is reverseable though, given a suitable mapping table.

Xorg evdev driver

The new “evdev” driver of course uses the new Linux evdev devices for receiving input, as Linux key codes. It forms its scan codes simply by adding 8 to the Linux key code. This is trivially reverseable.

Xorg XQuartz driver

The Xorg build for OS-X, known as XQuartz, does not use either of the previously mentioned drivers. Instead it has a OS-X specific keyboard driver that receives input directly from the OS-X graphical toolkit. It receives OS-X virtual key codes and produces X key codes simply by adding 8. This is again trivially reversable.

RFB protocol extended key event

The RFB protocol extension defined by GTK-VNC and QEMU allows for providing a key code in addition to the key symbol for all keypress/release events sent over the wire. The key codes are defined to be based on a simple encoding of XT scan code set. Single byte XT scan codes 0-127 are sent directly. Two-byte scan codes with the grey code (0xe0) are sent as single bytes with the high bit set. eg 0x1e is sent as 0x1e, while 0xe0+0x1e is sent as 0x9e (ie 0x1e | 0x80). The press/release state is sent as a separate value.

QEMU internals

QEMU’s internal input subsystem for keyboard events uses raw XT scan codes directly. The hardware device emulation maps to other scan code sets as required. It is no coincidence that the RFB protocol extension for sending key codes is a trivial mapping to QEMU scan codes.

Microsoft Windows

The Windows GDI defines a standard set of virtual key codes that are independent of any particular keyboard hardware.

Apple OS-X

OS-X defines a standard set of virtual key codes that are based on the original Apple extended keyboard scan codes.

SDL

The SDL_KeyboardEvent struct contains a “scancode” field. This provides the operating system dependant key code corresponding to the key event. This is easily interpreted on OS-X/Windows, but on Linux X11 this requires knowledge of what keyboard driver is used by Xorg.

GTK/GDK

The GDKKeyEvent struct contains a “hardware_keycode” field. This provides the operating system dependant key code corresponding to the key event. This is easily interpreted on OS-X/Windows, but on Linux X11 this requires knowledge of what keyboard driver is used by Xorg.

Future ideas

The number of different scan code and key code sets is really impressive / depressing / horrific. The good news is that given the right information, it is possible to map between them all fairly easily, in a lossless fashion. The bad news is that the data for performing such mappings is hidden across many web pages and many, many source trees. It would be great to have a single point of reference providing all the scan/key code sets, along with a tool that can generate the mapping tables & example code to convert between any 2 sets.

More than you (or I) ever wanted to know about virtual keyboard handling

Posted: July 4th, 2010 | Filed under: Gtk-Vnc, Virt Tools | Tags: , , , , , , , | 6 Comments »

As a general rule, people using virtual machines have only one requirement when it comes to keyboard handling: any key they press should generate the same output in the host OS and the guest OS. Unfortunately, this is a surprisingly difficult requirement to satisfy. For a long time when using either Xen or KVM, to get workable keyboard handling it was necessary to configure the keymap in three places, 1. the VNC client OS, 2. the guest OS, 3. QEMU itself. The third item was a particular pain because it meant that a regular guest OS user would need administrative access to the host to change the keymap of their guest. Not good for delegation of control. In Fedora 11 we introduced a special VNC extension which allowed us to remove the need to configure keymaps in QEMU, so now it is merely necessary to configure the guest OS to match the client OS. One day when we get a general purpose guest OS agent, we might be able to automatically set the guest OS keymap to match client OS, whenever connecting via VNC, removing the last manual step. This post aims to give background on how keyboards work, what we done in VNC to improve the guest keyboard handling and what problems we still have.

Keyboard hardware scan codes

Between the time of pressing the physical key and text appearing on the screen, there are several steps in processing the input with data conversions along the way. The keyboard generates what are known as scan codes, a sequence of one or more bytes, which uniquely identifies the physical key and whether it is a make or break (press or release) event. Scan codes are invariant across all keyboards of the same type, regardless of what label is printed on the key. In the PC world, IBM defined the first scan code set with their IBM XT, followed later by AT scan codes and PS/2 scan codes. Other manufacturers adopted the IBM scan codes for their own products to ensure they worked out of the box. In the USB world, the HID specification defines the standard scan codes that manufacturers must use.

Operating system key codes

For operating systems wanted to support more than one different type of keyboard, scan codes are not a particularly good representation. They are also often unwieldly as a result of encoding both the key & its make/break state into the same byte(s). Thus operating systems typically define their own standard set of key codes, which is able to represent any possible keys on all known keyboards. They will also track the make/break state separately, now using press/release or up/down as terminology. Thus the first task of the keyboard driver is to convert from the hardware specific scan codes to the operating system specific key code. This is an easily reverseable, lossless mapping.

Display / toolkit key symbols & modifiers

Key codes still aren’t a concept that is particularly useful for (most) applications, which would rather known what user’s intended symbol was, rather than the physical key. Thus the display service (X11, Win32 GUI, etc) or application toolkit (GTK) define what are known as key symbols. To convert from key codes to key symbols, a key map is required for the language specific keyboard layout. The key map declares modifier keys (shift, alt, control, etc) and provides a list of key symbols that are associated with each key code. This mapping is only reverseable if you know the original key map. This is also a lossy mapping, because it is possible for several different key codes to map to the same key symbol.

Considering an end-to-end example, starting with the user pressing the key in row 4, column 2 which is labelled ‘A’ in a US layout, XT compatible keyboard. The operating system keyboard driver receives XT scan code 0x1e, which it converts to Linux key code 30 (KEY_A), Xorg server keyboard driver further converts to X11 key symbol 0x0061 (XK_a), GTK toolkit converts this to GDK key symbol 0x0061 (GDK_a), and finally the application displays the character ‘a’ in the text entry field on screen. There are actually a couple of conversions I’ve left out here, specifically how X11 gets the key codes from the OS and how X11 handles key codes internally, which I’ll come back to another time.

The problem with virtualization

For 99.99% of applications all these different steps / conversions are no problem at all, because they are only interested in text entry. Virtualization, as ever, introduces fun new problems where ever it goes. The problem occurs at the interface between the host virtual desktop client and the hardware emulation. The virtual desktop may be a local fat client using a toolkit like SDL, or it may be a remote network client using a protocol like VNC, RFB or SPICE. In both cases, a naive virtual desktop client will be getting key symbols with their key events. The hardware emulation layer will usually want to provide something like a virtualizated PS/2 or USB keyboard. This implies that there needs to be a conversion from key symbols back to hardware specific scan codes. Remember a couple of paragraphs ago where it was noted that the key code -> key symbol conversion is lossy. That is a now a big problem. In the case of a network client it is even worse, because the virtualization host does not even know what language specific keymap was used.

Faced with these obstacles, the initial approach QEMU took was to just add a command line parameter ‘-k $KEYMAP’. Without this parameter set, it will assume the virtuall desktop client is using a US layout, otherwise it will use the specified keymap. There is still the problem that many key codes can map to the same key symbol. It is impossible to get around this problem – QEMU just has to pick one of the many possible reverse mappings & use it. This means hat, even if the user configures matching keymaps on their client, QEMU and the guest OS, there may be certain keys that will never work in the guest OS. There is also a burden for QEMU to maintain a set of keymaps for every language layout. This set is inevitably incomplete.

The solution with virtualization

Fortunately, there is a get out of jail free card available. When passing key events to an application, all graphical windowing systems will provide both the key symbol and layout independent key code. Remember from many paragraphs earlier that scan code -> key code conversion is lossless and easily reverseable. For local fat clients, the only stumbling block is knowing what key code set is in use. Windows and OS-X both define a standard virtual key set, but sadly X11 does not :-( Instead the key codes are specific to the X11 keyboard driver that is in use, with a Linux based Xorg this is typically ‘kbd’ or ‘evdev’. QEMU has to use heuristics on X11 to decide which key codes it is receiving, but at least once this is identified, the conversion back to scan codes is trivial.

For the VNC network client though, there was one additional stumbling block. The RFB protocol encoding of key events only includes the key symbol, not the key code. So even though the VNC client has all the information the VNC server needs, there is no way to send it. Leading upto the development of Fedora 11, the upstream GTK-VNC and QEMU communities collaborated to define an official extension to the RFB protocol for an extended key event, that includes both key symbol and key code. Since every windowing system has its own set of key codes & the RFB needs to be platform independent, the protocol extension defined that the keycode set on the wire will be a special 32-bit encoding of the traditional XT scancodes. It is not a coincidence that QEMU already uses a 32-bit encoding of traditional XT scan codes internally :-) With this in place the RFB client, merely has to identify what operating system specific key codes it is receiving and then apply the suitable lossless mapping back to XT scancodes.

Considering an end-to-end example, starting with the user pressing the key in row 4, column 2 which is labelled ‘A’ in a US layout, XT compatible keyboard. The operating system keyboard driver receives XT scan code 0x1e, which it converts to Linux key code 30 (KEY_A), Xorg server keyboard driver further converts to X11 key symbol 0x0061 (XK_a) and an evdev key code, GTK toolkit converts the key symbol to GDK key symbol 0x0061 (GDK_a) but passes the evdev key code unchanged. The VNC client converts the evdev key code to the RFB key code and sends it to the QEMU VNC server along with the key symbol. The QEMU VNC server totally ignores the key symbol and does the no-op conversion from the RFB key code to the QEMU key code. The QEMU PS/2 emulation converts the QEMU keycode to either the XT scan code, AT scan code or PS/2 scan code, depending on how the guest OS has configured the keyboard controller. Finally the conversion mentioned much earlier takes place in the guest OS and the letter ‘a’ appears in a text field in the guest. The important bit to note is that although the key symbol was present, it was never used in the host OS or remote network client at any point. The only conversions performed were between scan codes and key codes all of which were lossless. The user is able to generate all the same key sequences in the guest OS as they can in the host OS. The user is happy their keyboard works as expected; the virtualization developer is happy at lack of bug reports about broken keyboard handling.

Provisioning KVM virtual machines on iSCSI the hard way (Part 2 of 2)

Posted: May 5th, 2010 | Filed under: libvirt, Virt Tools | Tags: , , | 4 Comments »

The previous post described how to setup an iSCSI target on Fedora/RHEL the hard way. This post demonstrates how to configure iSCSI on a libvirt KVM host using virsh and then provison a guest using virt-install.

Defining the storage pool

libvirt manages all storage through an object known as a “storage pool”. There are many types of storage pools SCSI, NFS, ext4, plain directory and, interesting for this article, iSCSI. All libvirt objects are configured via an XML description and storage pools are no exception. For an iSCSI storage pool there are three pieces of information to provide. The “Target path” determines how libvirt will expose device paths for the pool. Paths like /dev/sda, /dev/sdb, etc are not a good choice because they are not stable across reboots, or across machines in a cluster, the names are assigned on a first come, first served basis by the kernel. It is strongly recommended that “/dev/disk/by-path” by used unless you know what you’re doing. This results in a naming scheme that will be stable across all machines. The “Host Name” is simply the fully qualified DNS name of the iSCSI server. Finally the “Source Path” is that adorable IQN seen earlier when creating the iSCSI target (“iqn.2004-04.fedora:fedora13:iscsi.kvmguests“). This isn’t the place to describe the full XML schema for storage pools, it suffices to say that an iSCSI config looks like this

<pool type='iscsi'>
    <name>kvmguests</name>
    <source>
        <host name='myiscsi.server.com'/>
        <device path='iqn.2004-04.fedora:fedora13:iscsi.kvmguests'/>
    </source>
    <target>
        <path>/dev/disk/by-path</path>
    </target>
</pool>

Save this XML snippet to a file named ‘kvmguests.xml’ and then load it into libvirt using the “pool-define” command.

# virsh pool-define kvmguests.xml
Pool kvmguests defined from kvmguests.xml
# virsh pool-list --all
Name                 State      Autostart
-----------------------------------------
default              active     yes
kvmguests           inactive   no

Starting the storage pool

This has saved the configuration, but has not actually logged into the iSCSI target, so no LUNs are yet visible on the virtualization host. Todo this requires running the “pool-start” command, at which point LUNs should be visible using the “vol-list” command:

# virsh pool-start kvmguests
Pool kvmguests2 started
# virsh vol-list kvmguests
Name                 Path
-----------------------------------------
10.0.0.1             /dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1
10.0.0.2             /dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-2

The volume names shown there are what will be required later in order to install a guest with virt-install.

Querying LUN information

Further information about each LUN can be obtained using the “vol-info” and “vol-dumpxml” commands

# virsh vol-info /dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1
Name:           10.0.0.1
Type:           block
Capacity:       10.00 GB
Allocation:     10.00 GB

# virsh vol-dumpxml /dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1
<volume>
  <name>10.0.0.1</name>
  <key>/dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1</key>
  <source>
  </source>
  <capacity>10737418240</capacity>
  <allocation>10737418240</allocation>
  <target>
    <path>/dev/disk/by-path/ip-192.168.122.2:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1</path>
    <format type='unknown'/>
    <permissions>
      <mode>0660</mode>
      <owner>0</owner>
      <group>6</group>
      <label>system_u:object_r:fixed_disk_device_t:s0</label>
    </permissions>
  </target>
</volume>

Activating the storage at boot time

Finally, if everything is looking in order, then the pool can be set to start automatically upon host boot.

# virsh pool-autostart kvmguests
Pool kvmguests2 marked as autostarted

Provisioning a guest on iSCSI

The virt-install command is a convenient way to install new guests from the command line. It has support for configuring a guest to use volumes from a storage pool via its –disk argument. This arg takes the name of the storage pool, followed by the name of the volume within it. It is now time to install a guest with two disks, the first exclusive use for its root filesystem, the second to be shareable between several guests for data:

# virt-install --accelerate --name rhel6x86_64 --ram 800 --vnc --hvm --disk vol=kvmguests/10.0.0.1 --disk vol=kvmguests/10.0.0.2,perms=sh --pxe

Once this is up and running, take a look at the guest XML that virt-install used to associate the guest with the iSCSI LUNs:

# virsh dumpxml rhel6x86_64
<domain type='kvm' id='4'>
  <name>rhel6x86_64</name>
  <uuid>ad8961e9-156f-746f-5a4e-f220bfafd08d</uuid>
  <memory>819200</memory>
  <currentMemory>819200</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.0.0'>hvm</type>
    <boot dev='network'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/disk/by-path/ip-192.168.122.170:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-1'/>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/disk/by-path/ip-192.168.122.170:3260-iscsi-iqn.2004-04.fedora:fedora13:iscsi.kvmguests-lun-2'/>
      <target dev='hdb' bus='ide'/>
      <shareable/>
      <alias name='ide0-0-1'/>
      <address type='drive' controller='0' bus='0' unit='1'/>
    </disk>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:0a:ca:84'/>
      <source network='default'/>
      <target dev='vnet1'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/28'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/28'>
      <source path='/dev/pts/28'/>
      <target port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5901' autoport='yes' keymap='en-us'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
  </devices>
</domain>

In particular, notice how the guest uses the huge paths under /dev/disk/by-path to refer to the LUNs, and also that the second disk has the <shareable/> flag set. This ensures the SELinux labelling allows multiple guests to access the disk and that all I/O caching is disabled on the host. Both critically important if you want the disk to be safely shared between guests.

Summing up

To allow migration of guests between hosts, some form of shared storage is required. People often turn to NFS at first, but this has security shortcomings because it does not allow for SELinux labelling, which means that there is limited sVirt protection between guests. ie one guest can access another guests’s disks. By choosing iSCSI as the shared storage platform, full sVirt isolation between guests is maintained on a par with non-shared storage setups. Hopefully this series of iSCSI related blog posts have shown that even provisioning KVM guests on iSCSI the hard way, is not actually very hard at all. It also shows that you don’t need a expensive commercial NAS to make use of iSCSI, any server with a bunch of disks running Fedora or RHEL can easily be turned into an iSCSI storage server for your virtual machines, though you will have to be prepared to get your hands dirty!

Provisioning KVM virtual machines on iSCSI the hard way (Part 1 of 2)

Posted: May 5th, 2010 | Filed under: libvirt, Virt Tools | Tags: , | 2 Comments »

The previous articles showed how to provision a guest on iSCSI the nice & easy way using a QNAP NAS and virt-manager. This article and the one that follows, will show how to provision a guest on iSCSI the “hard way”, using the low level command line tools tgtadm, virsh and virt-install. The iSCSI server in this case is going to be provided by a guest running Fedora 13, x86_64.

Enabling the iSCSI target service

The first task is to install and enable the iSCSI target service. On Fedora and RHEL servers, the iSCSI target service is provided by the ‘scsi-target-utils’ RPM package, so install that now and set the service to start on boot

# yum install scsi-target-utils
# chkconfig tgtd on
# service tgtd start

Allocating storage for the LUNs

The  Linux SCSI target service does not care whether the LUNs exported are backed by plain files, LVM volumes or raw block devices, though obviously there is some performance overhead from introducing the LVM and/or filesystem layers as compared to block devices. Since the guest providing the iSCSI service in this example has no spare block device or LVM space, raw files will have to be used. In this example, two LUNs will be created one thin provisioned (aka sparse file) 10 GB LUN and one fully allocated 500 MB LUN

# mkdir -p /var/lib/tgtd/kvmguests
# dd if=/dev/zero of=/var/lib/tgtd/kvmguests/rhel6x86_64.img bs=1M seek=10240 count=0
# dd if=/dev/zero of=/var/lib/tgtd/kvmguests/shareddata.img bs=1M count=512
# restorecon -R /var/lib/tgtd

Exporting an iSCSI target and LUNs (the manual way)

Historically, you had to invoke a series of tgtadm commands to setup the iSCSI target and LUNs and then add them to /etc/rc.d/rc.sysinit to make sure they run on every boot. This is true of RHEL5 vintage scsi-target-utils at least. If you have a more recent version circa Fedora 13 / RHEL-6 there is finally a nice configuration file to handle this setup, so those lucky readers can skip ahead. The first step is to add a target, for this the adorable IQNs make a re-appearance

# tgtadm --lld iscsi --op new --mode target --tid 1 --targetname iqn.2004-04.fedora:fedora13:iscsi.kvmguests

Next step is to associate the storage volumes, just created, with LUNs in the iSCSI target.

# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /var/lib/tgtd/kvmguests/rhel6x86_64.img
# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 2 --backing-store /var/lib/tgtd/kvmguests/shareddata.img

To confirm that all went to plan, query the iSCSI target setup

# tgtadm  --lld iscsi --op show --mode target
Target 1: iqn.2004-04.fedora:fedora13:iscsi.kvmguests
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB
            Online: Yes
            Removable media: No
            Backing store type: rdwr
            Backing store path: None
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 10737 MB
            Online: Yes
            Removable media: No
            Backing store type: rdwr
            Backing store path: /var/lib/tgtd/kvmguests/rhel6x86_64.img
        LUN: 2
            Type: disk
            SCSI ID: IET     00010002
            SCSI SN: beaf12
            Size: 537 MB
            Online: Yes
            Removable media: No
            Backing store type: rdwr
            Backing store path: /var/lib/tgtd/kvmguests/shareddata.img
    Account information:
    ACL information:

Finally, allow client access to the target. This example allows access to all clients without any authentication.

# tgtadm --lld iscsi --op bind --mode target --tid 1 --initiator-address ALL

Exporting an iSCSI target and LUNs (with a config file)

As mentioned earlier, modern versions of scsi-target-utils now include a configuration file for setting up targets and LUNs. The master configuration is /etc/tgt/targets.conf and is full of example configurations. To replicate the manual setup from above requires adding a configuration block that looks like this

<target iqn.2004-04.fedora:fedora13:iscsi.kvmguests>
backing-store /var/lib/tgtd/kvmguests/rhel6x86_64.img
backing-store /var/lib/tgtd/kvmguests/shareddata.img
</target>

With the configuration update, load it into the iSCSI target daemon

#tgt-admin --execute

Two common mistakes

The two most likely places to trip up when configuring the iSCSI target are SELinux and iptables. If adding plain files as LUNs in an iSCSI target, make sure the files are labelled suitably with system_u:object_r:tgtd_var_lib_t:s0. For iptables, ensure that port 3260 is open.

That is a very quick guide to setting up an iSCSI target on Fedora 13. The next step is to switch back to the virtualization host and provision a new guest using iSCSI for its virtual disk. This is covered in Part II