Versioning in the libvirt library

Posted: January 13th, 2011 | Author: | Filed under: libvirt, Virt Tools | 4 Comments »

The libvirt library uses a number of approaches to versioning to cover various usage scenarios:

  • Software release versions
  • Package manager versions (RPM)
  • pkgconfig version
  • libtool version
  • ELF library soname/version
  • ELF symbol version

The goal of libvirt is to never change the library ABI, in other words, once an API is added, struct declared, macro declared, etc it can never be changed. If an API was found flawed, the existing API may be annotated as deprecated, but it will never be removed. Instead a new alternative API is added.

Each new software release has an associated 3 component version number, eg 0.6.3, 0.8.0, 0.8.1. There is no strict policy on the significance of each component. At time of writing, the major component has never been changed. The minor component is changed when there is significant new functionality added to the library (usually a major collection of new APIs). The macro component is changed in each release and reset to zero when the minor component changes. So from an application developer’s point of view, if they want to use a particular API they look for the software release version that introduced the new API.

The text that follows often uses libvirt as an example. The principals / effects described apply to any ELF library and are not specific to libvirt.

Simple compile time version checking

Applications building against libvirt will make use of the pkgconfig tool to probe for existance of libvirt. The software release version is used, unchanged, as the pkgconfig version number. Thus if an application knows the API it wants was introduced in version 0.8.3, it will typically perform a compile time check like

$ pkg-config --atleast-version=0.8.3 libvirt \
&& echo "Found" || echo "Missing"

Or in autoconf macros

PKG_CHECK_MODULES([LIBVIRT], [libvirt >= 0.8.3])

This will result in compiler/linker flags that look something like

-lvirt

Or

-I/home/berrange/usr/include -L/home/berrange/usr/lib -lvirt

When the compiler links the application to libvirt, it will take the libvirt ELF soname and embed it in the application. The libvirt ELF soname is simply “libvirt.so.0” and since libvirt policy is to maintain ABI, it will never change.

libtool also includes library versioning magic that is intended to allow for version checks that are independent of the software release version. Few libraries ever fully make use of this capability since it is poorly understood, easy to get wrong and doesn’t integrate with the rest of the toolchain / OS distro. libvirt sets libtool version based on the package release version. It won’t be discussed further.

Simple runtime version checking

When an application is launched the ELF loader will attempt to resolve the linked libraries and verify their current soname matches that embedded in the application. Since libvirt will never break ABI, the soname check will always succeed. It should be clear that the soname check is a very weak versioning scheme. The application may have been linked against version 0.8.3, but the installed version it is run against could be 0.6.0 and this mismatch would not be caught by the soname check.

Installation version checking

The software release version is also used unchanged for the package manager version field. In RPM packages there is an extra ‘release’ component appended to the version for extra fine checking.

Thus the package manager can be used to provide stronger runtime version guarantee, by checking the full version number at installation time. The aforementioned example application would typically want to include a dependency

Requires: libvirt >= 0.8.3

The RPM package manager, however, also has support for automatic dependency extraction. For this it will extract the soname from any ELF libraries the application links against and secretly add statements of the form

Requires: libvirt.so.0()(64bit)

The libvirt RPM itself also gains secret statements of the form

Provides: libvirt.so.0()(64bit)

Since this is based on the soname though, these automatic dependencies are a very weak versioning scheme.

Symbol versioning

Most linkers will provide library developers with a way to control which symbols are exported for use by applications. This may be done via annotations in the source file, or more often via a standalone “linker script” which simply whitelists symbols to be exported.

The GNU ELF linker has support for doing more than simply whitelisting symbols. It allows symbols to be grouped together and a version number attached to each group. A group can also be annotated as being a child of another group. In this manner you can create trees of versioned symbols, although most common libraries will only use one path without branching.

The libvirt library uses the software release version to group symbols which were added in that release. The top levels of the libvirt tree are logically arranged like:

+------------------------+
| LIBVIRT_0.0.3          |
+------------------------+
| virConnectClose        |
| virConnectOpen         |
| virConnectListDomains  |
| ....                   |
+------------------------+
|
V
+------------------------+
| LIBVIRT_0.0.5          |
+------------------------+
| virDomainGetUUID       |
| virConnectLookupByUUID |
+------------------------+
|
V
+------------------------+
| LIBVIRT_0.1.0          |
+------------------------+
| virInitialize          |
| virNodeGetInfo         |
| virDomainReboot        |
| ....                   |
+------------------------+
|
V
....

Runtime version checking with version symbols

When a library is built normally, the ELF headers will contain a list of plain symbol names:

$ eu-readelf -s /usr/lib64/libvirt.so | grep 'GLOBAL DEFAULT' | awk '{print $8}'
virConnectOpen
virConnectClose
virDomainGetUUID
....

When a library is built with versioned symbols though, the name is mangled to include a version string:

$ eu-readelf -s /usr/lib64/libvirt.so | grep 'GLOBAL DEFAULT' | awk '{print $8}'
virConnectOpen@@LIBVIRT_0.0.3
virConnectClose@@LIBVIRT_0.0.3
virDomainGetUUID@@LIBVIRT_0.0.5
....

Similarly when an application is built against a normal library, the ELF headers will contain a list of plain symbol names that it uses:

$ eu-readelf -s /usr/bin/virt-viewer  | grep 'GLOBAL DEFAULT' | awk '{print $8}'
virConnectOpenAuth
virDomainFree
virDomainGetID

And as expected, when built against an library with versioned symbols, the application ELF headers will contained symbols mangled to include a version string:

$ eu-readelf -s /usr/bin/virt-viewer  | grep 'GLOBAL DEFAULT' | awk '{print $8}'
virConnectOpenAuth@LIBVIRT_0.4.0
virDomainFree@LIBVIRT_0.0.3
virDomainGetID@LIBVIRT_0.0.3

When the application is launched, the linker is now able to perform strict version checking. For each symbol ‘FOO’

- If application listed an unversioned 'FOO'
  - If library has an unversioned 'FOO'
    => Success
  - Else library has a versioned 'FOO'
    => Success
- Else application listed a versioned 'FOO'
  - If library has an unversioned 'FOO'
    => Fail
  - Else library has a versioned 'FOO'
    - If versions of 'FOO' match
      => Success
    - Else versions don't match
      => Fail

Notice that an application with unversioned symbols will work with a versioned libary. A versioned application will not work with an unversioned library. This allows for versioned symbols to be added to be introduced to an existing unversioned library without causing an ABI compatibility problem.

With an application and library both using versioned symbols, the compile time requirement the developer declared to pkgconfig with “libvirt >= 0.8.3” is now effectively encoded in the binary via symbol versioning. Thus a fairly strict check can be performed at runtime to ensure the correct library is installed.

Installation version checking with versioned symbols

Remember that the RPM automatic dependencies extractor will find the ELF soname in libraries/applications. When using versioned symbols, it can go one step further and find the symbol version strings too.

The libvirt RPM will thus automatically get extra versioned dependencies

$ rpm -q libvirt-client --provides | grep libvirt
libvirt.so.0()(64bit)
libvirt.so.0(LIBVIRT_0.0.3)(64bit)
libvirt.so.0(LIBVIRT_0.0.5)(64bit)
libvirt.so.0(LIBVIRT_0.1.0)(64bit)
libvirt.so.0(LIBVIRT_0.1.1)(64bit)
libvirt.so.0(LIBVIRT_0.1.4)(64bit)
libvirt.so.0(LIBVIRT_0.1.5)(64bit)

An application will also automatically get extra data based on the versions of the symbols it links against

$ rpm -q virt-viewer --requires | grep libvirt
libvirt.so.0()(64bit)
libvirt.so.0(LIBVIRT_0.0.3)(64bit)
libvirt.so.0(LIBVIRT_0.0.5)(64bit)
libvirt.so.0(LIBVIRT_0.1.0)(64bit)
libvirt.so.0(LIBVIRT_0.4.0)(64bit)
libvirt.so.0(LIBVIRT_0.5.0)(64bit)

This example shows that this virt-viewer cannot run against this libvirt, since the libvirt library is missing versions 0.4.0 and 0.5.0

Thus, there is now rarely any need for an application developer to manually list dependencies against the libvirt library. ie they can remove any line like

Requires: libvirt >= 0.8.3

RPM will determine the minimum version automatically. This is good, because application developers will often forget to update these manually stated deps. The only possible exception to this, is where the
application needs to avoid a implementation bug in a particular version of libvirt and so request a version that is newer than that indicated by the symbol versions.  In practice this isn’t very useful for libvirt, since most libvirt drivers have a client server model and the RPM dependency only validates the client, not the server.

The perils of backporting APIs with versioned symbols

It is fairly common when maintaining software in Linux distributions to want to backport code from a newer release, rather than rebase the entire package. Most maintainers will only attempt this for bug fixes or security fixes, but sometimes there is demand to backport new features, and even new APIs. From the descriptions shown above it should be clear that there are a number of serious downsides associated with backporting new APIs.

Impact on compile time pkgconfig checks

Application developers look at the version where a symbol was introduced and encode that in a pkgconfig check at compile time.  If an OS distribution backports a symbol to an earlier version, the application developer has to now create OS-distro specific checks for the presence of the backported API. This is a retrograde step for the application developer, because the primary reason for using pkgconfig is that it is OS-distro *independent*

Next consider what happens to the libvirt.so symbols when an API is backported. There are three possible approaches. Keep the original symbol version, or adopt the symbol version of the release being backported to, or invent a new version. All approaches ultimately fail.

Impact of maintaining the new version with a backport

Consider backporting an API from 0.5.0, to a 0.4.0 release. If the original symbol version is maintained the runtime checks by the ELF library loader will still succeed normally. There is an impact on the install time package manager checks though. Backporting a single API from 0.5.0 release will result in the libvirt-client library gaining

Provides: libvirt.so.0(LIBVIRT_0.5.0)(64bit)

Even if there are many other APIs in 0.5.0 that are not backported. An application which uses an API from the 0.5.0 relase will gain

Requires: libvirt.so.0(LIBVIRT_0.5.0)(64bit)

This application can now be installed against either 0.4.0 + backported API, or 0.5.0. RPM is unable to correctly check requirements at application install time, because it cannot distinguish the two scenarios. The API backport has broken the RPM versioning.

Impact of adopting the older version with a backport

The other option for backporting is thus to adopt the version of the release being backported to. ie change the backported API’s symbol version string from

virEventRegisterImpl@LIBVIRT_0.5.0

To

virEventRegisterImpl@LIBVIRT_0.4.0

The problem here is that an application which was built against a normal libvirt binary will have a requirement for

virEventRegisterImpl@LIBVIRT_0.5.0

Attempting to run this application against the libvirt.so with the backported API will fail, because the linker will see that “LIBVIRT_0.4.0”  is not the same as “LIBVIRT_0.5.0”.

All libvirt applications now have to be specially compiled for this OS distro’s libvirt to pick up correct symbol versions. This doesn’t solve the core problem, it merely reverses it. The application binary is now incompatible with a normal libvirt, making it impossible to test it against upstream libvirt releases without rebuilding the application too.

In addition the RPM version checking has once again been broken. An application that needs the backported symbol will have

Requires: libvirt.so.0(LIBVIRT_0.4.0)(64bit)

There is no way for RPM to validate whether this means the plain ‘0.4.0’ release, or the 0.4.0 + backport release.

Changing the symbol version when doing the backport has in effect change the ABI of the library in this particular OS distro.

What is worse, is that this changed symbol version has to be preserved in every single future release of the OS distro. ie applications from RHEL-5 expect to be able to run against and libvirt from RHEL-6, RHEL-7, etc without recompilation.

So changing the symbol version has all the downsides of not changing the symbol version, with an additional permanent maintenance burden of patching versions for every future OS release.

Impact of inventing a new version with a backport

For libraries which only target the Linux ELF loader and no other OS platform there is one other possible trick. The ELF linker script does not require a single linear progression of versioned symbol groups. The tree can have arbitrary branches, provided it remains a DAG. A symbol can also appear in multiple places at once.

This could hypothetically be leveraged during a backport of an API, by introducing a branch in the version tree at which the symbol appears. So instead of the symbol being either

virEventRegisterImpl@LIBVIRT_0.5.0
virEventRegisterImpl@LIBVIRT_0.4.0

It could be (eg based off the RPM release where it was backported).

virEventRegisterImpl@LIBVIRT_0.4.0_1

The RPM will thus gain

Provides: libvirt.so.0(LIBVIRT_0.4.0_1)(64bit)

And applications built against this backport will gain

Requires: libvirt.so.0(LIBVIRT_0.4.0_1)(64bit)

As compared to the first option of maintaining the original symbol version LIBVIRT_0.5.0, the RPM version checking will continue to operate correctly.

This approach, however, still has all the permanent package maintenance problems of the second approach. ie the patch which adds a ‘LIBVIRT_0.4.0_1’ version has to be maintained for every single future release of the OS distro, even once the backport of the actual code has gone. It also means that applications built against the backported library won’t work with an upstream build and vica-versa.

Conclusion for backporting symbols

It has been demonstrated that whatever approach is taken, backporting a API in a library with version symbols, will break either RPM version checking or cause incompatibilities in ELF loader version checking or both. It also requires applications to write OS distro specific checks. For a library that intends to maintain permanent API/ABI stability for application developers this is unacceptable.

The conclusion is clear: never backport APIs from *any* library, especially not one that makes any use of symbol versioning.

If a new API is required, then the entire package should be rebased to the version which includes the desired API.