Debugging early startup of KVM with GDB, when launched by libvirtd

Posted: October 12th, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , | 1 Comment »

Earlier today I was asked how one would go about debugging early startup of KVM under GDB, when launched by libvirtd. It was not possible to simply attach to KVM after it had been launched by libvirtd, since that was too late. In addition running the same KVM command outside libvirt did not exhibit the problem that was being investigated.

Fortunately, with a little cleverness, it is actually possible to debug a KVM guest launched by libvirtd, right from the start. The key is to combine a couple of breakpoints with use of follow-fork-mode. When libvirtd starts up a KVM guest, it runs QEMU a couple of times in order to detect which command line arguments are supported. This means the follow-fork-mode setting cannot be changed too early, otherwise GDB will end up following the wrong process.

I happen to know that there is only one place in the libvirt code which calls virCommandSetPreExecHook, and that is immediately before launching the real QEMU process. A nice thing about GDB is that when following forked/exec’d children, it will apply any existing breakpoints in the child, even if it is a new binary. So a break point set on ‘main’, while still in libvirtd will happily catch ‘main’ in the QEMU process. The only remaining problem is that if QEMU does not setup and activate the monitor quickly enough, libvirtd will try to kill it off again. Fortunately GDB lets you ignore SIGTERM, and even SIGKILL :-)

The start of the trick is this:

# pgrep libvirtd
# gdb
(gdb) attach 12345
(gdb) break virCommandSetPreExecHook
(gdb) cont

Now in a separate shell

# virsh start $GUESTNAME

Back in the GDB shell the breakpoint should have triggered, allowing the trick to be finished:

(gdb) break main
(gdb) handle SIGKILL nopass noprint nostop
Signal        Stop	Print	Pass to program	Description
SIGKILL       No	No	No		Killed
(gdb) handle SIGTERM nopass noprint nostop
Signal        Stop	Print	Pass to program	Description
SIGTERM       No	No	No		Terminated
(gdb) set follow-fork-mode child
(gdb) cont
process 3020 is executing new program: /usr/bin/qemu-kvm
[Thread debugging using libthread_db enabled]
[Switching to Thread 0x7f2a4064c700 (LWP 3020)]
Breakpoint 2, main (argc=38, argv=0x7fff71f85af8, envp=0x7fff71f85c30)
    at /usr/src/debug/qemu-kvm-0.14.0/vl.c:1968
1968	{

Bingo, you can now debug QEMU startup at your leisure