Code coverage testing – what it misses

Posted: April 12th, 2006 | Filed under: Coding Tips | No Comments »

For all my development projects I try to make use of code coverage tools to ensure the test suites are reasonably comprehensive, for example, with Test-AutoBuild I use the excellant Devel-Cover module. The nightly build runs the test suite and publishes a code coverage report giving a breakdown of test coverage for API documentation, functions, statements, and even conditional expressions. The colour coding of coverage makes it possible to quickly identify modules which are lacking coverage and, given knowledge about which modules contain most complexity, limited resources for writing tests can be directed to areas of the code which will have the biggest impact in raising application quality.

When using code coverage, however, one must be careful not to fall into the trap of writing tests simply to increase coverage. There are many aspects of the code which just aren’t worth while testing – for example areas so simple that the time involved writing tests is not offset by a meaingful rise in code quality. More importantly though, is that there is a limit to what source code coverage analysis can tell you about the real world test coverage. It is perfectly feasible to have 100% coverage over a region of code and still have serious bugs. The basic root of the problem is that the system being tested is not operating in isolation. No matter how controlled your test environment is, there are always external variables which can affect your code.

I encountered just such an example last weekend. A few months back I added a comprehensive set of tests for validating the checkout of code modules from Perforce, Subversion, and Mercurial. The code coverage report said: 100% covered. Great I thought, I can finally forget about this bit of code for a while. And then we passed the Daylight Savings Time shift and all the tests started failing. It turned out that the modules were not correctly handling timezone information when parsing dates while DST was in effect. There is no easy way test for this other than to run the same test suite over & over under at least 4 different timezones – UTC (GMT), BST (GMT+1), EST (GMT+5), EDT (EST+1/GMT+6). Just setting $TZ isn’t really enough – to automate reliably I would really need to run the builds on four different geographically dispersed servers (or perhaps 4 Xen instances each running in a different timezones).

A second example, testing that no modules have hardcoded the path separator is simply impossible to test for within a single run of a test suite. Running the test on UNIX may give a pass, and 100% coverage, but this merely tells me which tells me that no module has used ‘\’ or ‘:’ as a path separator. To validate that no module has used ‘/’ as a path separator the only option is to re-run the test suite on Windows. Fortunately virtualization can come to the rescue this time again, in the form of QEMU which allows emulation of an x86 CPU.

Going back to example of checking out code from a SCM server, another problem in Test-AutoBuild (which I must address soon) is ensuring that the different failure conditions in talking to the SCM server are handled. Some of the things which can go wrong include, incorrect host name specified, a network outage causes a connection to break mid-operation, incorrect path for the module to checkout, missing installation of local SCM client tools. 100% test coverage of the code for checking out a module can’t tell you that there is a large chunk of error handling code missing altogether.

In summary, no matter how comprehensive your test suite is, there is always room for improvement. Think about what code is not there – error handling code. Think about what external systems you interact with & the failures scenarios that can occur. Think about what environmental assumptions you might have made – OS path separators. Think about what environmental changes can occurr – time zones. In summary while code coverage is an incredibly valuable tool in identifying what areas of *existing* code are not covered, only use it to help priortise ongoing development of a test suite, not as an end goal. There really is no substitute for running the tests under as many different environments as you can lay your hands on. And not having access to a large server farm is no longer an excuse – virtualization (take your pick of Xen, QEMU, UML, and VMWare) will allow a single server to simulate dozens of different environments. The only limit to testing is your imagination….