Easier QEMU live tracing using systemtap

Posted: February 18th, 2019 | Filed under: Coding Tips, Fedora, Virt Tools | Tags: , , , | No Comments »

QEMU is able to leverage a number of live tracing systems, the choice configurable at build time between

  • log – printf formatted string for each event sent into QEMU’s logging system which writes to stderr
  • syslog – printf formatted string for each event sent via syslog
  • simple – binary data stream for each event written to a file or fifo pipe
  • ftrace – printf formatted string for each event sent to kernel ftrace facility
  • dtrace – user space probe markers dynamically enabled via dtrace or systemtap
  • ust – user space probe markers dynamically enabled via LTT-ng

Upstream QEMU enables the “log” trace backend by default since it is cross-platform portable and very simple to use by adding “-d trace:PATTERN” on the QEMU command line. For example to enable logging of all trace events in the QEMU I/O subsystem (aka “qio“) we can

$ qemu -d trace:qio* ...some args...
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

This is very simple and surprisingly effective much of the time, but it is not without its downsides

  • Inactive probes have non-negligible performance impact on hot codepaths
  • It is targetted to human consumption, so it is not easy to reliably process with machines
  • It requires adding arguments to QEMU’s command line so is not easy to enable in many cases
  • It is custom to QEMU so does not facilitate getting correlated traces across the whole system

For these reasons, some downstreams chose not to use the default “log” backend. Both Fedora and RHEL have instead enabled the “dtrace” backend which can be leveraged via systemtap on Linux. This provides a very powerful tracing system, but the cost is that the previous simple task of printing a formatted string when a probe point fires has become MUCH more complicated. For example to get equivalent output to that seen with QEMU’s log backend would require

# cat > trace.stp <<EOF
probe qemu.system.x86_64.qio_task_new {
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", 
           pid(), gettimeofday_ns(), task, source, func, opaque)
}
EOF
# stap trace.stp
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Repeat that code snippet for every qio* probe point you want to watch, figuring out the set of args it has available to print.This quickly becomes tedious for what should be a simple logging job, especially if you need to reference null terminated strings from userspace.

After cursing this difficulty one time too many, it occurred to me that QEMU could easily do more to make life easier for systemtap users. The QEMU build system is already auto-generating all the trace backend specific code from a generic description of probes in the QEMU source tree. It has a format string which is used in the syslog, log and ftrace backends, but this is ignored for the dtrace backend. It did not take much to change the code generator so that it can use this format string to generate a convenient systemtap tapset representing the above manually written probe:

probe qemu.system.x86_64.log.qio_task_new = qemu.system.x86_64.qio_task_new ?
{
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n",
           pid(), gettimeofday_ns(), task, source, func, opaque)
}

This can be trivially executed with minimal knowledge of systemtap tapset language required

# stap -e "qemu.system.x86_64.log.qio_task_new{}"
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Even better, we have now gained the ability to use wildcards too

# stap -e "qemu.system.x86_64.log.qio*{}"
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

Users still, however, need to be aware of the naming convention for QEMU’s systemtap tapsets and how it maps to the particular QEMU binary that is used, and don’t forget the trailing “{}”. Thus I decided to go one step further and ship a small helper tool to make it even easier to use

$ qemu-trace-stap run qemu-system-x86_64 'qio*'
22806@1547735341399856820 qio_channel_socket_new Socket new ioc=0x56135d1d7c00
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
22806@1547735341399865943 qio_task_thread_start Task thread start task=0x56135cd66eb0 worker=0x56135af72e50 opaque=0x56135c071d70
22806@1547735341399976816 qio_task_thread_run Task thread run task=0x56135cd66eb0

The second argument to this tool is the QEMU binary filename to be traced, which can be relative (to search $PATH) or absolute. What is clever is that it will set the SYSTEMTAP_TAPSET env variable to point to the right location to find the corresponding tapset definition. This is very useful when you have multiple copies of QEMU on the system and need to make sure systemtap traces the right one.

The ‘qemu-trace-stap‘ script takes a verbose arg so you can understand what it is running behind the scenes:

$ qemu-trace-stap run /home/berrange/usr/qemu-git/bin/qemu-system-x86_64 'qio*'
Using tapset dir '/home/berrange/usr/qemu-git/share/systemtap/tapset' for binary '/home/berrange/usr/qemu-git/bin/qemu-system-x86_64'
Compiling script 'probe qemu.system.x86_64.log.qio* {}'
Running script, <Ctrl>-c to quit
...trace output...

It can enable multiple probes at once

$ qemu-trace-stap run qemu-system-x86_64 'qio*' 'qcrypto*' 'buffer*'

By default it monitors all existing running processes and all future launched proceses. This can be restricted to a specific PID using the –pid arg

$ qemu-trace-stap run --pid 2532 qemu-system-x86_64 'qio*'

Finally if you can’t remember what probes are valid it can tell you

$ qemu-trace-stap list qemu-system-x86_64
ahci_check_irq
ahci_cmd_done
ahci_dma_prepare_buf
ahci_dma_prepare_buf_fail
ahci_dma_rw_buf
ahci_irq_lower
...snip...

This new functionality merged into QEMU upstream a short while ago and will be included in the QEMU 4.0 release coming at the end of April.

Leave a Reply





Spam protection: Sum of tw0 plus s3ven ?: