Improving libvirt firewall performance

Previously I talked about improving the performance of logging in libvirt. This article will consider the second, even bigger, performance problem I’ve tackled over the past month or so, the management of firewall rules.

There are three areas of libvirt which have a need to interact with the host firewall

  • Virtual networks – add rules to iptables/ip6tables to control IPv4/IPv6 forwarding and masquerading of traffic from virtual networks (eg the virbr0 device)
  • MAC filtering – adds rules to ebtables to reject spoofed MAC addresses from guest NICs (pretty much obsoleted by general filtering)
  • General network filtering – adds almost arbitrary rules to ebtables/iptables/ip6tables to filter guest NIC traffic

When first written all of these areas of libvirt code would directly invoke the iptables/ip6tables/ebtables command line tools to add/remove the rules needed. Then along came firewalld. While we could pretend firewalld didn’t exist and continue invoking the CLI tools directly, this would end in tears when firewalld flushes chains / tables. When we first realized that libvirt needed to add firewalld support, time was short wrt the forthcoming Fedora release schedule which would include firewalld by default. Thus an expedient decision was made, simply replacing any calls to iptables/ip6tables/ebtables with a call to ‘firewall-cmd –passthrough’ instead. This allowed libvirt to integrate with firewalld, so its rules would not get lost, with the minimum possible number of code changes to libvirt.

Unfortunately, it quickly became apparent that using ‘firewall-cmd’ imposes quite a severe performance penalty. Testing with libvirt’s virtual network feature showed that to start 20 virtual networks took 3 seconds with direct iptables usage and 42 seconds with “firewall-cmd –passthrough”. IOW using firewall-cmd is over x10 slower than direct invocation. The network filtering code is similarly badly affected, running our integration test suite went from 28 seconds to 479 seconds, almost x18 slower with firewall-cmd.

We’ve lived with this performance degradation in libvirt for a little while now, but it was clear that this was never going to be acceptable in the long term. This kind of performance bottleneck in firewall manipulation really hurts applications using libvirt, slowing down guest creation in OpenStack noticeably, and even making libvirtd much slower to startup. With a little slack time in my schedule I did some microbenchmarks comparing repeated invocation of firewall-cmd to a script that instead directly invoked the underlying firewalld dbus API repeatedly. While I don’t have the results anymore, it suffices to say that, the tests strongly showed the overhead lay in the firewall-cmd tool, as opposed to the DBus API.

Based on the test results it was clear that libvirt needed to switch to talking to firewalld directly via DBus, instead of spawning an external program to do this indirectly. Even without the performance overhead of firewall-cmd this would be just good engineering practice, particularly wrt error handling, since parsing errors from stderr is truly horrible. The virtual network code had a lot of complexity in it to cope with the fact that applying any single firewall rule might fail and thus need a set of steps to rollback to the previous state of the firewall. This was nothing compared to the complexity of the network filtering code, which would dynamically generate hairy/scary shell scripts with conditionals to either report or ignore errors that occurred and to dynamically query existing rules to decide what to create next.

What libvirt needed was an internal API for interacting with the firewall which would magically talk to either the DBus API, or the iptables/ip6tables/ebtables tools directly, as appropriate for the host being managed. To further reduce the burden on users of the API, it would also to need to have the concept of “transactions”. That is a user of the API can define a set of rules that it wants applied, and if applying any of those rules fails, then a set of “rollback” steps would be applied to cleanup. Finally, it would need the ability to register hooks which query the rule state and dynamically changed later rules.

It took a while to figure out the right design for this libvirt internal firewall API, but once done the task of converting all libvirt code remained ahead. The virtual network and MAC filtering code was fairly easy to convert over the space of a day or so. The same could not be said of the network filtering code. The task of changing it from dynamically writing shell scripts, to using the new firewall API lasted days, which turned into weeks, eventually turning into the best part of a month’s work. What saved me from insanity was that the original creator of this firewall code had the good sense to create a comprehensive set of integration tests for it. This made it much easier to identify places where I’d screwed up and has dramatically reduced the number of regressions I am likely to have created. Hopefully there are no regressions at all.

At the end of this months work, as well as the original integration tests, libvirt now also has a large number of new unit tests which can validate the operation of our firewall code. These will demonstrate that future changes in this area, do not alter the commands used to build the firewall, complementing the integration tests which validate the final firewall ruleset. Now back to the performance. Creating 20 virtual networks originally took 29 seconds with firewall-cmd, now takes only 3 seconds with firewalld DBus APIs. So there is no measurable difference from direct invocation of iptables there. Running the network filter integration test suite, which took 479 seconds with firewall-cmd, now only takes 37 seconds with firewalld DBus APIs. This is slightly slower than direct ebtables invocation which took 29 seconds, but I think that delta is now at an acceptably low level.

This code is all merged into current libvirt GIT and will be included in the 1.2.4 release due out next week and heading to Fedora rawhide immediately thereafter. So come Fedora 21, anything firewall related in libvirt should be noticeably faster. If any application or script you work on happens to make use of firewall-cmd, then I’d strongly recommend changing it to use the DBus API directly. I don’t know why firewall-cmd is so terribly slow, but as it is today, I can’t recommend anyone using it if they have a non-trivial number of rules to define.

Guest MAC spoofing denial of service and preventing it with libvirt and KVM

I was recently asked to outline some of the risks of virtualization wrt networking, in particular, how guests running on the same network could attack each other’s network traffic. The examples in this blog post will consider a scenario with three guests running on the same host, connected to the libvirt default virtual machine (backed by the virbr0 bridge device). As is traditional, the two guests trying to communicate shall be called alice, bob, while the attacker/eavesdropper shall be eve. Provision three guests with those names, and make sure their network configuration look like this

  <interface type='network'>                 (for the VM 'alice')
    <mac address='52:54:00:00:00:11'/>
    <source network='default'/>
    <target dev='vnic-alice"/>
    <model type='virtio'/>
  </interface>

  <interface type='network'>                 (for the VM 'bob')
    <mac address='52:54:00:00:00:22'/>
    <source network='default'/>
    <target dev='vnic-bob"/>
    <model type='virtio'/>
  </interface>
  <interface type='network'>                 (for the VM 'eve')
    <mac address='52:54:00:00:00:33'/>
    <source network='default'/>
    <target dev='vnic-eve"/>
    <model type='virtio'/>
  </interface>

If the guest interfaces are to be configured using DHCP, it is desirable to have predictable IP addresses for alice, bob & eve. This can be achieved by altering the default network configuration:

# virsh net-destroy default
# virsh net-edit default

In the editor change the IP configuration to look like

<ip address='192.168.122.1' netmask='255.255.255.0'>
  <dhcp>
    <range start='192.168.122.2' end='192.168.122.254' />
    <host mac='52:54:00:00:00:11' name='alice' ip='192.168.122.11' />
    <host mac='52:54:00:00:00:22' name='bob' ip='192.168.122.22' />
    <host mac='52:54:00:00:00:33' name='eve' ip='192.168.122.33' />
  </dhcp>
</ip>

With all these changes made, start the network and the guests

# virsh net-start default
# virsh start alice
# virsh start bob
# virsh start eve

After starting these three guests, the host sees the following bridge configuration

# brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.fe5200000033	yes		vnic-alice
							vnic-bob
							vnic-eve

For the sake of testing, the “very important” communication between alice and bob will be a repeating ICMP ping. So login to ‘alice’ (via the console, not the network) and leave the following command running forever

# ping bob
PING bob.test.berrange.com (192.168.122.22) 56(84) bytes of data.
64 bytes from bob.test.berrange.com (192.168.122.22): icmp_req=1 ttl=64 time=0.790 ms
64 bytes from bob.test.berrange.com (192.168.122.22): icmp_req=2 ttl=64 time=0.933 ms
64 bytes from bob.test.berrange.com (192.168.122.22): icmp_req=3 ttl=64 time=0.854 ms
...

Attacking VMs on a hub

The first thought might be for eve to just run ‘tcpdump‘ (again via the console shell, not a network shell):

# tcpdump icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
...nothing captured...

Fortunately Linux bridge devices act as switches by default, so eve won’t see any traffic flowing between alice and bob. For the sake of completeness though, I should point out that it is possible to make a Linux bridge act as a hub instead of a switch. This can be done as follows:

# brctl setfd 0
# brctl setageing 0

Switching back to the tcpdump session in eve, should now show traffic between alice and bob being captured

10:38:15.644181 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 29, length 64
10:38:15.644620 IP bob.test.berrange.com > alice.test.berrange.com: ICMP echo reply, id 8053, seq 29, length 64
10:38:16.645523 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 30, length 64
10:38:16.645886 IP bob.test.berrange.com > alice.test.berrange.com: ICMP echo reply, id 8053, seq 30, length 64

Attacking VMs on a switch using MAC spoofing

Putting the bridge into ‘hub mode’ was cheating though, so reverse that setting on the host

# brctl setageing 300

Since the switch is clever enough to only send traffic out of the port where it has seen the corresponding MAC address, perhaps eve can impersonate bob by spoofing his MAC address. MAC spoofing is quite straightforward; in the console for eve run

# ifdown eth0
# ifconfig eth0 hw ether 52:54:00:00:00:22
# ifconfig eth0 up
# ifconfig eth0 192.168.122.33/24

Now that the interface is up with eve‘s IP address, but bob‘s MAC address, the final step is to just poison the host switch’s MAC address/port mapping. A couple of ping packets sent to an invented IP address (so alice/bob don’t see any direct traffic from eve) suffice todo this

# ping -c 5 192.168.122.44

To see whether eve is now receiving bob‘s traffic launch tcpdump again in eve‘s console

# tcpdump icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:02:41.981567 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 1493, length 64
11:02:42.981624 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 1494, length 64
11:02:43.981785 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 1495, length 64
...

The original ‘ping’ session, back in alice‘s console, should have stopped receiving any replies from bob since all his traffic is being redirected to eve. Occasionally bob‘s OS might send out some packet on its own accord which re-populates the host bridge’s MAC address/port mapping, causing the ping to start again. eve can trivially re-poison the mapping at any time by sending out further packets of her own.

Attacking VMs on a switch using MAC and IP spoofing

The problem with only using MAC spoofing is that traffic from alice to bob goes into a black hole – the ping packet loss quickly shows alice that something is wrong. To try and address this, eve could also try spoofing bob‘s IP address, by running:

# ifconfig eth0 192.168.122.22/24

The tcpdump session in eve should now show replies being sent back out, in response to alice‘s ping requests

# tcpdump icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:10:55.797471 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 1986, length 64
11:10:55.797521 IP bob.test.berrange.com > alice.test.berrange.com: ICMP echo reply, id 8053, seq 1986, length 64
11:10:56.798914 IP alice.test.berrange.com > bob.test.berrange.com: ICMP echo request, id 8053, seq 1987, length 64
11:10:56.799031 IP bob.test.berrange.com > alice.test.berrange.com: ICMP echo reply, id 8053, seq 1987, length 64

A alice‘s ping session will now be receiving replies just as she expects, except that unbeknown to her, the replies are actually being sent by eve not bob.

Protecting VMs against MAC/IP spoofing

So eve can impersonate a ping response from bob, big deal ? What about some real application level protocols like SSH or HTTPS which have security built in. These are no doubt harder to attack, but by no means impossible particularly if you are willing to bet/rely on human/organizational weakness. For MITM attacks like this, the SSH host key fingerprint is critical. How many people actually go to the trouble of checking that the SSH host key matches what it is supposed to be, when first connecting to a new host ? I’d wager very few. Rather more users will question the alert from SSH when a previously known host key changes, but I’d still put money on a non-trivial number ignoring the warning. For HTTPS, the key to avoiding MITM attacks is the x509 certificate authority system. Everyone knows that this is absolutely flawless without any compromised/rogue CA’s ;-P

What can we do about these risks for virtual machines running on the same host ? libvirt provides a reasonably advanced firewall capability in both its KVM and LXC drivers. This capability is built upon the standard Linux ebtables, iptables and ip6tables infrastructure and enables rules to be set per guest TAP device. The example firewall filters that are present out of the box provide a so called “clean traffic” ruleset. Amongst other things, these filters prevent and MAC and IP address spoofing by virtual machines. Enabling this requires a very simple change to the guest domain network interface configuration.

Shutdown alice, bob and eve and then alter their XML configuration (using virsh edit) so that each one now contains the following:

  <interface type='network'>                 (for the VM 'alice')
    <mac address='52:54:00:00:00:11'/>
    <source network='default'/>
    <target dev='vnic-alice"/>
    <model type='virtio'/>
    <filterref filter='clean-traffic'>
  </interface>

  <interface type='network'>                 (for the VM 'bob')
    <mac address='52:54:00:00:00:22'/>
    <source network='default'/>
    <target dev='vnic-bob"/>
    <model type='virtio'/>
    <filterref filter='clean-traffic'>
  </interface>

  <interface type='network'>                 (for the VM 'eve')
    <mac address='52:54:00:00:00:33'/>
    <source network='default'/>
    <target dev='vnic-eve"/>
    <model type='virtio'/>
    <filterref filter='clean-traffic'>
  </interface>

Start the guests again and now try to repeat the previous MAC and IP spoofing attacks from eve. If all is working as intended, it should be impossible for eve to capture any traffic between alice and bob, or disrupt it in any way.

The clean-traffic filter rules are written to require two configuration parameters, the whitelisted MAC address and the whitelisted IP address. The MAC address is inserted by libvirt automatically based on the declared MAC in the XML configuration. For the IP address, libvirt will sniff the DHCPOFFER responses from the DHCP server running on the host to learn the assigned IP address. There is a fairly obvious attack with this, where by someone just runs a rogue DHCP server. It is possible to alter the design of the filter rules so that any rogue DHCP servers are blocked, however, there is one additional problem. Upon migration of guests, the new host needs to learn the IP address, but guests’s don’t re-run DHCP upon migration between it is supposed to be totally seemless. Thus in most cases, when using filters, the host admin will want to explicitly specify the guest’s IP address in the XML

    <filterref filter='clean-traffic'>
      <parameter name='IP' value='192.168.122.33'>
    </filterref>

There is quite alot more that can be done using libvirt’s guest network filtering capabilities. One idea would be to block outbound SMTP traffic to prevent compromised guests being turned into spambots. In fact almost anything that an administrator might wish todo inside the guest using iptables, could be done in the host using libvirt’s network filtering, to provide additional protection against guest OS compromise.
This will be left as an exercise for the reader…