Full coverage of libvirt XML schemas achieved in libvirt-go-xml

Posted: December 7th, 2017 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Virt Tools | Tags: , | No Comments »

In recent times I have been aggressively working to expand the coverage of libvirt XML schemas in the libvirt-go-xml project. Today this work has finally come to a conclusion, when I achieved what I believe to be effectively 100% coverage of all of the libvirt XML schemas. More on this later, but first some background on Go and XML….

For those who aren’t familiar with Go, the core library’s encoding/xml module provides a very easy way to consume and produce XML documents in Go code. You simply define a set of struct types and annotate their fields to indicate what elements & attributes each should map to. For example, given the Go structs:

type Person struct {
    XMLName xml.Name `xml:"person"`
    Name string `xml:"name,attr"`
    Age string `xml:"age,attr"` 
    Home *Address `xml:"home"`
    Office *Address `xml:"office"`
} 
type Address struct { 
    Street string `xml:"street"`
    City string `xml:"city"` 
}

You can parse/format XML documents looking like

<person name="Joe Blogs" age="24">
  <home>
    <street>Some where</street><city>London</city>
  </home>
  <office>
    <street>Some where else</street><city>London</city>
  </office>  
</person>

Other programming languages I’ve used required a great deal more work when dealing with XML. For parsing, there’s typically a choice between an XML stream based parser where you have to react to tokens as they’re parsed and stuff them into structs, or a DOM object hierarchy from which you then have to pull data out into your structs. For outputting XML, apps either build up a DOM object hierarchy again, or dynamically format the XML document incrementally. Whichever approach is taken, it generally involves writing alot of tedious & error prone boilerplate code. In most cases, the Go encoding/xml module eliminates all the boilerplate code, only requiring the data type defintions. This really makes dealing with XML a much more enjoyable experience, because you effectively don’t deal with XML at all! There are some exceptions to this though, as the simple annotations can’t capture every nuance of many XML documents. For example, integer values are always parsed & formatted in base 10, so extra work is needed for base 16. There’s also no concept of unions in Go, or the XML annotations. In these edge cases custom marshaling / unmarshalling methods need to be written. BTW, this approach to XML is also taken for other serialization formats including JSON and YAML too, with one struct field able to have many annotations so it can be serialized to a range of formats.

Back to the point of the blog post, when I first started writing Go code using libvirt it was immediately obvious that everyone using libvirt from Go would end up re-inventing the wheel for XML handling. Thus about 1 year ago, I created the libvirt-go-xml project whose goal is to define a set of structs that can handle documents in every libvirt public XML schema. Initially the level of coverage was fairly light, and over the past year 18 different contributors have sent patches to expand the XML coverage in areas that their respective applications touched. It was clear, however, that taking an incremental approach would mean that libvirt-go-xml is forever trailing what libvirt itself supports. It needed an aggressive push to achieve 100% coverage of the XML schemas, or as near as practically identifiable.

Alongside each set of structs we had also been writing unit tests with a set of structs populated with data, and a corresponding expected XML document. The idea for writing the tests was that the author would copy a snippet of XML from a known good source, and then populate the structs that would generate this XML. In retrospect this was not a scalable approach, because there is an enourmous range of XML documents that libvirt supports. A further complexity is that Go doesn’t generate XML documents in the exact same manner. For example, it never generates self-closing tags, instead always outputting a full opening & closing pair. This is semantically equivalent, but makes a plain string comparison of two XML documents impractical in the general case.

Considering the need to expand the XML coverage, and provide a more scalable testing approach, I decided to change approach. The libvirt.git tests/ directory currently contains 2739 XML documents that are used to validate libvirt’s own native XML parsing & formatting code. There is no better data set to use for validating the libvirt-go-xml coverage than this. Thus I decided to apply a round-trip testing methodology. The libvirt-go-xml code would be used to parse the sample XML document from libvirt.git, and then immediately serialize them back into a new XML document. Both the original and new XML documents would then be parsed generically to form a DOM hierarchy which can be compared for equivalence. Any place where documents differ would cause the test to fail and print details of where the problem is. For example:

$ go test -tags xmlroundtrip
--- FAIL: TestRoundTrip (1.01s)
	xml_test.go:384: testdata/libvirt/tests/vircaps2xmldata/vircaps-aarch64-basic.xml: \
            /capabilities[0]/host[0]/topology[0]/cells[0]/cell[0]/pages[0]: \
            element in expected XML missing in actual XML

This shows the filename that failed to correctly roundtrip, and the position within the XML tree that didn’t match. Here the NUMA cell topology has a ‘<pages>‘  element expected but not present in the newly generated XML. Now it was simply a matter of running the roundtrip test over & over & over & over & over & over & over……….& over & over & over, adding structs / fields for each omission that the test identified.

After doing this for some time, libvirt-go-xml now has 586 structs defined containing 1816 fields, and has certified 100% coverage of all libvirt public XML schemas. Of course when I say 100% coverage, this is probably a lie, as I’m blindly assuming that the libvirt.git test suite has 100% coverage of all its own XML schemas. This is certainly a goal, but I’m confident there are cases where libvirt itself is missing test coverage. So if any omissions are identified in libvirt-go-xml, these are likely omissions in libvirt’s own testing.

On top of this, the XML roundtrip test is set to run in the libvirt jenkins and travis CI systems, so as libvirt extends its XML schemas, we’ll get build failures in libvirt-go-xml and thus know to add support there to keep up.

In expanding the coverage of XML schemas, a number of non-trivial changes were made to existing structs  defined by libvirt-go-xml. These were mostly in places where we have to handle a union concept defined by libvirt. Typically with libvirt an element will have a “type” attribute, whose value then determines what child elements are permitted. Previously we had been defining a single struct, whose fields represented all possible children across all the permitted type values. This did not scale well and gave the developer no clue what content is valid for each type value. In the new approach, for each distinct type attribute value, we now define a distinct Go struct to hold the contents. This will cause API breakage for apps already using libvirt-go-xml, but on balance it is worth it get a better structure over the long term. There were also cases where a child XML element previously represented a single value and this was mapped to a scalar struct field. Libvirt then added one or more attributes on this element, meaning the scalar struct field had to turn into a struct field that points to another struct. These kind of changes are unavoidable in any nice manner, so while we endeavour not to gratuitously change currently structs, if the libvirt XML schema gains new content, it might trigger further changes in the libvirt-go-xml structs that are not 100% backwards compatible.

Since we are now tracking libvirt.git XML schemas, going forward we’ll probably add tags in the libvirt-go-xml repo that correspond to each libvirt release. So for app developers we’ll encourage use of Go vendoring to pull in a precise version of libvirt-go-xml instead of blindly tracking master all the time.

ANNOUNCE: New libvirt project Go language bindings

Posted: December 15th, 2016 | Filed under: Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , | No Comments »

I’m happy to announce that the libvirt project is now supporting Go language bindings as a primary deliverable, joining Python and Perl, as language bindings with 100% API coverage of libvirt C library. The master repository is available on the libvirt GIT server, but it is expected that Go projects will consume it via an import of the github mirror, since the Go ecosystem is heavilty github focused (e.g. godoc.org can’t produce docs for stuff hosted on libvirt.org git)

import (
    libvirt "github.com/libvirt/libvirt-go"
)

conn, err := libvirt.NewConnect("qemu:///system")
...

API documentation is available on the godoc website.

For a while now libvirt has relied on 3rd parties to provide Go language bindings. The one most people use was first created by Alex Zorin and then taken over by Kyle Kelly. There’s been a lot of excellent work put into these bindings, however, the API coverage largely stops at what was available in libvirt 1.2.2, with the exception of a few APIs from libvirt 1.2.14 which have to be enabled via Go build tags. Libvirt is now working on version 3.0.0 and there have been many APIs added in that time, not to mention enums and other constants. Comparing the current libvirt-go API coverage against what the libvirt C library exposes reveals 163 missing functions (out of 476 total), 367 missing enum constants (out of 847 total) and 165 missing macro constants (out of 200 total). IOW while there is alot already implemented, there was still a very long way to go.

Initially I intended to contribute patches to address the missing API coverage to the existing libvirt-go bindings. In looking at the code though I had some concerns about the way some of the APIs had been exposed to Go. In the libvirt C library there are a set of APIs which accept or return a “virTypedParameterPtr” array, for cases where we need APIs to be easily extensible to handle additions of an arbitrary extra data fields in the future. The way these APIs work is one of the most ugly and unpleasant parts of the C API and thus in language bindings we never expose the virTypedParameter concept directly, but instead map it into a more suitable language specific data structure. In Perl and Python this meant mapping them to hash tables, which gives application developers a language friendly way to interact with the APIs. Unfortunately the current Go API bindings have exposed the virTypedParameter concept directly to Go and since Go does not support unions, the result is arguably even more unpleasant in Go than it already is in C. The second concern is with the way events are exposed to Go – in the C layer we have different callbacks that are needed for each event type, but have one method for registering callbacks, requiring an ugly type cast. This was again exposed directly in Go, meaning that the Go compiler can’t do strong type checking on the callback registration, instead only doing a runtime check at time of event dispatch. There were some other minor concerns about the Go API mapping, such as fact that it needlessly exposed the “vir” prefix on all methods & constants despite already being in a “libvirt” package namespace, returning of a struct instead of pointer to a struct for objects. Understandably the current maintainer had a desire to keep API compatibility going forward, so the decision was made to fork the existing libvirt-go codebase. This allowed us to take advantage of all the work put in so far, while fixing the design problems, and also extending them to have 100% API coverage. The idea is that applications can then decide to opt-in to the new Go binding at a point in time where they’re ready to adapt their code to the API changes.

For users of the existing libvirt Go binding, converting to the new official libvirt Go binding requires a little bit of work, but nothing too serious and will simplify the code if using any of the typed parameter methods. The changes are roughly as follows:

  • The “VIR_” prefix is dropped from all constants. eg libvirt.VIR_DOMAIN_METADATA_DESCRIPTION because libvirt.DOMAIN_METADATA_DESCRIPTION
  • The “vir” prefix is dropped from all types. eg libvirt.virDomain becomes libvirt.Domain
  • Methods returning objects now return a pointer eg “* Domain” instead of “Domain”, allowing us to return the usual “nil” constant on error, instead of a struct with no underlying libvirt connection
  • The domain events DomainEventRegister method has been replaced by a separate method for each event type. eg DomainEventLifecycleRegister, DomainEventRebootRegister, etc giving compile time type checking of callbacks
  • The domain events API now accepts a single callback, instead of taking a pair of callbacks – the caller can create an anonymous function to invoke multiple things if required.
  • Methods accepting or returning typed parameters now have a formal struct defined to expose all the parameters in a manner that allows direct access without type casts and enables normal Go compile time type checking. eg the Domain.GetBlockIOTune method returns a DomainBlockIoTuneParameters struct
  • It is no longer necessary to use Go compiler build tags to access functionality in different libvirt versions. Through the magic of conditional compilation, the binding will transparently build against every libvirt version from 1.2.0 through 3.0.0
  • The binding can find libvirt via pkg-config making it easy to compile against a libvirt installed to a non-standard location by simply setting “PKG_CONFIG_PATH”
  • There is 100% coverage of all APIs [1], constants and macros, verified by the libvirt CI system, so that it always keeps up with GIT master of the Libvirt C library.
  • The error callback concept is removed from the binding as this is deprecated by libvirt due to not being thread safe. It was also redundant since every method already directly returns an error object.
  • There are now explicit types defined for all enums and methods which take flags or enums now use these types instead of “uint32”, again allowing stronger compiler type checking

With the exception of the typed parameter changes adapting existing apps should be a largely boring mechanical conversion to just adapt to the renames.

Again, without the effort put in by Alex Zorin and Kyle Kelly & other community contributors, creation of these new libvirt-go bindings would have taken at least 4-5 weeks instead of the 2 weeks effort put into this. So there’s a huge debt owed to all the people who previously contributed to libvirt Go bindings. I hope that having these new bindings with guaranteed 100% API coverage will be of benefit to the Go community going forward.

[1] At time of writing this is a slight lie, as i’ve not quite finished the virStream and virEvent callback method bindings, but this will be done shortly.