Using SystemTAP for dynamically instrumented the Linux kernel

Posted: January 27th, 2006 | Filed under: Coding Tips, Uncategorized | No Comments »

It has been just over a year & 1/2 since I first blogged about DTrace suggesting that a similar tool would be very valuable to the Linux community. Well after a few long email threads, it turned out that a significant number of people within Red Hat agreed with this assessment and so in partnership with IBM and Intel the SystemTAP project came into life at the start of 2005. Starting with the previously developed KProbes dynamic instrumentation capability a huge amount of work has been done building out a high level language and runtime for safely, efficiently & reliably probing the kernel. It has seen a limited ‘technology preview’ in RHEL-4, and with its inclusion in the forthcoming Fedora Core 5 it will be exposed to a much wider community of users & potential developers.

On the very same day as Dave Jones was looking at the Fedora boot process via static kernel instrumentation, I was (completely co-incidentally ) playing around using SystemTAP to instrument the boot process. The probe I wrote looked at file opens, process fork/execve to enable a hierarchical view of startup to be pieced together. A simplified version of the script looked like:

global indent

function timestamp() {
  return string(gettimeofday_ms()) . indent[pid()] . " "
}

function proc() {
  return string(pid()) . " (" . execname() . ")"
}

function push(pid, ppid) {
  indent[pid] = indent[ppid] . "  "
}

function pop(pid) {
  delete indent[pid]
}

probe kernel.function("sys_clone").return {
  print(timestamp() . proc() . " forks " . string(retval()). "\n")
  push(retval(), pid())
}

probe kernel.function("do_execve") {
  print(timestamp() . proc() . " execs " . kernel_string($filename) . "\n")
}

probe kernel.function("sys_open") {
  if ($flags & 1) {
    print(timestamp() . proc() . " writes " . user_string($filename) . "\n")
  } else {
    print(timestamp() . proc() . " reads " . user_string($filename) . "\n")
  }
}

probe kernel.function("do_exit") {
  print(timestamp() . proc() . " exit\n")
  pop(pid())
}

A few tricks later it was running during boot, and having analysed the results with Perl one can display a summary of how many files each init script opened

 1    init                                        read  90 write  30 running...
   251  init                                      read  29 write   0 run 23.08s
     252  rc.sysinit                              read 1035 write  45 run 22.91s
       274  start_udev                            read 355 write 128 run 15.10s
         286  start_udev                          read  91 write   0 run 1.90s
           287  MAKEDEV                           read  91 write   0 run 1.88s
         291  udevstart                           read 177 write 124 run 3.95s
         614  usleep                              read   2 write   0 run 1.05s
       649  udev-stw.modules                      read  84 write   5 run 1.23s
       701  dmraid                                read 111 write   0 run 1.07s
   748  rc                                        read 235 write  13 run 14.57s
     753  S10network                              read 111 write  16 run 2.85s
     833  S12syslog                               read  44 write   3 run 0.43s
     844  S25netfs                                read  87 write   1 run 1.51s
     861  S55cups                                 read  31 write   2 run 1.70s
     878  S55sshd                                 read  52 write   1 run 0.86s
     892  S97messagebus                           read  31 write   2 run 0.44s
     900  S98NetworkManager                       read  96 write  10 run 0.58s
     910  S98NetworkManagerDispatcher             read  92 write   3 run 0.67s
     921  S98avahi-daemon                         read  29 write   0 run 0.41s
     929  S98haldaemon                            read  31 write   2 run 4.20s
     955  S99local                                read  17 write   1 run 0.16s

There are so many other interesting ways to analyse the data collected at boot which I don’t have space for in this blog, so I’ve put all the information (including how to run SystemTAP during boot) up on my Red Hat homepage