Improving libvirt logging performance

Posted: May 2nd, 2014 | Filed under: Coding Tips, Fedora, libvirt, OpenStack | Tags: , , , , | No Comments »

Over the past month I’ve been working on a few different areas of libvirt to address some performance bottlenecks the code has. This article is about improvements to the logging subsystem in libvirt, primarily to reduce the CPU overhead that the libvirtd daemon imposes on virtualization hosts. This might not seem like a big deal, but when you have many 100’s or 1000’s of containers packed into a machine, libvirtd CPU usage could start to become quite noticable at times.

Over the years the libvirt codebase has gained more and more debugging statements which have been worth their weight in gold when it comes to troubleshooting problems in deployments. Through the configuration files or environment variables it is possible to tailor logging to output to stderr, plain files, syslog or, most recently, the systemd journal, as well as filters to restrict which files will emit log messages. One less well known feature though is that libvirt has an internal logging ring buffer which records every single log message, no matter what logging output / filter settings are configured. It was possible to send libvirt a signal to dump this ring buffer to stderr, and it would also be dumped automatically upon crash.

For some reason, we had never previously thought to measure the impact of our logging framework on the performance of libvirt. Investigating an unrelated performance problem one day I noticed that oprofile was showing unexpectedly high counts against asprintf/memcpy/malloc functions in an area of code that I wasn’t expecting to be utilizing those functions. I recently had time to investigate this in more detail and discovered that the cause was the internal logging ring buffer. We should have realized long ago that the task of asprintf formatting the log messages and copying them into the ring buffer, would have a non-negligible CPU overhead as the number of log statements in our codebase increased.

To enable the overhead to be more precisely quantified, I wrote a short benchmark program for libvirt which simulated the behaviour of the libvirtd main thread running the event loop with 500 virtual machines / containers all doing simultaneous virtual console I/O. With the existing code, running ~50,000 iterations of this simulated event loop took 1 minute 40 seconds. After a quick hack to turn our logging code into a complete no-op, the demo program completed in only 3 seconds. IOW, the logging code was responsible for 97% of the CPU time consumed by the benchmark. This is somewhat embarrassing, to say the least :-)

Obviously we can not leave the logging code as a complete no-op permanently, since it is incredibly valuable to have it. What we need is for it to be as close to a no-op as possible, unless a specific logging output / filter is actually enabled. With this goal, unfortunately, it became clear that the idea of an always-on internal ring buffer for collecting all logs just had to go. There is simply no way to reduce the overhead of this ring buffer, since it inherently has to asprintf the log messages, which is where the majority of the performance hit comes from. At first I just tweaked the way the ring buffer operated so that it would only collect messages that were already generated for an explicitly configured log output. This reduced the runtime of the benchmark program to 4.6 seconds which is pretty decent. After doing this though, we came to the conclusion that the ring buffer as a concept was now pretty much worthless, since the messages it contains were already being output to another log target. So in the end I just deleted the ring buffer code entirely.

The remaining delta of 1.6 seconds over the base line 3 second execution time was potentially a result of two things. First, for each log message generated, there would be multiple string comparisons against the filename of the code emitting it, in order to determine if log messages for that file were requested for output to a log target. Second, the logging code is inside a mutex protected block to ensure serialization of outputs. The latter did not affect the benchmark program since it was single threaded, and uncontended mutex locks on Linux have negligible overhead. It would, however, affect real world usage of libvirtd since the daemon is heavily threaded. To address both of these potential performance problems, we introduced the idea of a log “category” object which has to be statically declared in each source file that intends to emit log messages. This category object has internal state which determines whether log messages from that source file will be sent on to a log output or just dropped. Without going into too much detail, the key benefit this brought, is that we can determine if any single log message needs to be output with only 2 lockless, integer comparisons in the common case. IOW this removed the repeated string comparisons (only need to be done once now) and removed the mutex locking. With this improvement the logging benchmark takes only 3 seconds. IOW there is no appreciable overhead from logging, except for those files where it is explicitly requested.

This performance improvement was released last month in the 1.2.3 release of libvirt, and thus in Fedora rawhide repositories.

EDIT: BTW, if you have an older libvirt you can edit /etc/libvirt/libvirtd.conf, set “log_buffer_size=0” and then restart libvirtd, which will disable the internal ring buffer and thus avoid the worst of the CPU wastage.