Adventures in migrating from static website + Blogger SFTP to WordPress

Posted: February 14th, 2010 | Filed under: Uncategorized | 2 Comments »

For many years now my main website has consisted of a set of statically generated webpages providing the overall structure. A couple of areas, notably my main blog were then dynamically generated using Blogger. The reason I started using Blogger was that it has the ability to publish posts directly to my webserver using SSH/SFTP, thus allowing the dynamic parts of the site to seemlessly integrate with the static parts. Then a couple of weeks ago, Blogger announced that they were discontinuing support for SFTP publishing on March 26th. Needless to say, this rather ruined my website publishing architecture. After thinking about things for a couple of weeks though, I decided this decision of Blogger’s is a blessing in disguise, because the way I managed website was completely outdated & needed to be brought into modern world.

What I in fact needed for a very simple content management system, that allowed publishing a small number of ‘static’ pages on site, but with the majority of the content being blog postings. Categorization, tagging & external links would be desirable too. Of course it has to be open source software too, capable of running on both my Debian Lenny webserver & Fedora laptop. As many people are no doubt aware, this is exactly what WordPress provides. As a proof of concept I downloaded the latest WordPress, tried out the install process on my laptop & generally got a feel for its admin capabilities. It all looked perfect, so that was a good decision made.

Exporting content from Blogger

Over the years that I’ve been using Blogger, I’ve written a few hundred postings, many worthless trash, but a fair number of them have really useful & frequently visited content. My recent series of articles on libvirt features have been particularly popular. It is absolutely non-negotiable that all existing links to these postings continue to work & don’t all end up broken. So the first step of the migration was to figure out how to export the content from Blogger into WordPress. The first thing I tried was WordPress’ own built-in import tool that can allegedly talk directly to Blogger and pull down all the postings & comments. The first problem I found with this, is that it only works if your content was hosted on Blogger. ie if you were using SFTP publishing it always reports ‘0 posts’. I temporarily update my blog settings to turn off SFTP and it at least detected all the posts at that point. I started the import process & it imported 3 posts and 70 comments and then gave up with no indication of what’s wrong. Tried again, and the same thing happened. Searching the WordPress forums it seems many people have hit this problem over the past 2 years with no reliable solution yet available.

Then I investigated whether Blogger had its own export capabilities. It does. It can export all your blog posts and comments in a single XML file. Unfortunately there is no apparent standard XML schema for blog import/export so there didn’t seem to be much use for this export capability & I didn’t fancy writing my own XSL transform to convert it to WordPress’ native XML import schema. The nice thing about Blogger and WordPress being so widely used on the web, is that if you have a problem, then the chances are that someone else has had the same problem already. In fact so many people have had this problem, that someone’s already written an tool to solve this, Google Blog Convertors

I tried downloaded it, fed it the Blogger exports and it generated some nice looking WordPress XML files. A closer look revealed one tiny flaw – it had unescaped a whole bunch of HTML tags in blog posts where I had been including snippets of example XML or HTML inside <pre> tags. Fortunately the code is all python and it was easy to find the bogus line of code “content = unescape(text)” and replace it with just “content = text“. After that the files imported into WordPress perfectly, preserving all formatting and comments.

Setting up URL redirects

Even though WordPress has a nice friendly URL scheme for articles based on their title, it is very slightly different from the scheme Blogger used for URLs. I was also merging several separate Blogger feeds into one, since WordPress has a nice categorization capability. It was thus inevitable that the URLs for all existing posts would have to change. The solution to this problem was pretty straightforward. Apache’s mod_rewrite engine can be told to load external files containing arbitrary key, value mappings, and then reference these maps in rewrite rules. It was a simple, albeit slightly tedious, process to write a map file that contained the old Blogger URL as the key and the new WordPress URL as the value. As an example, a tiny part of the map I created looks like this

/diary/2008/04/presentation-is-everything /posts/2008/04/21/presentation-is-everything
/diary/2008/06/red-hat-summit-2008 /posts/2008/06/18/red-hat-summit-2008
/diary/2009/12/using-qcow2-disk-encryption-with /posts/2009/12/02/using-qcow2-disk-encryption-with-libvirt-in-fedora-12

To make use of this map, just requires two rules in the httpd.conf file, one to load the map and the other to add a match for it. Those rules look like this

RewriteMap blogger txt:/etc/apache2/blogger-rewrite.txt
RewriteRule ^/personal(/diary/.*) ${blogger:$1} [L,R=permanent]

In summary, while the migration process from Blogger to WordPress was not entirely smooth, it went alot better than I expected it to. Any web user following an old link to a post on my site now gets a permanent redirect to the new location, so no important links were broken during the migration. The new site I have is soo much more flexible than the old one & the WordPress UI is  very much nicer to use. Blogger’s UI is rather dated & not really on a par with the standard of Google’s other popular apps like GMail.

Controlling guest CPU & NUMA affinity in libvirt with QEMU, KVM & Xen

Posted: February 12th, 2010 | Filed under: libvirt, Virt Tools | 16 Comments »

When provisioning new guests with libvirt, the standard policy for affinity between the guest and host CPUs / NUMA nodes, is to have no policy at all. In other words the guest will follow whatever the hypervisor’s own default policy is, which is usually to run the guest on whatever host CPU is available. There are times when an explicit policy may be better, in particular to make the most of a NUMA architecture it is usually desirable to lock a guest to a particular NUMA node so that its memory allocations are always local to the node it is running on, avoiding the cross-node memory transports which have less bandwidth. As of writing, libvirt supports this capability for QEMU, KVM and Xen guests. Even on a non-NUMA system some form of explicit placement across the hosts’ sockets, cores & hyperthreads may be desired.

Querying host CPU / NUMA topology

The first step in deciding what policy to apply is to figure out the host’s topology is. The virsh nodeinfo command provides information about how many sockets, cores & hyperthreads there are on a host.

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              8
CPU frequency:       1000 MHz
CPU socket(s):       2
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         8179176 kB

There are a total of 8 CPUs, in 2 sockets, each with 4 cores.

More interesting though is the NUMA topology. This can be significantly more complex, so the data is provided in a structured XML document, as part of the virsh capabilities output

# virsh capabilities
<capabilities>

  <host>
    <cpu>
      <arch>x86_64</arch>
    </cpu>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='2'>
        <cell id='0'>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
          </cpus>
        </cell>
        <cell id='1'>
          <cpus num='4'>
            <cpu id='4'/>
            <cpu id='5'/>
            <cpu id='6'/>
            <cpu id='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <secmodel>
      <model>selinux</model>
      <doi>0</doi>
    </secmodel>
  </host>

 ...removed remaining XML...

</capabilities>

This tells us that there are two NUMA nodes (aka cells), each containing 4 logical CPUs. Since we know there are two sockets, we can obviously infer from this that each socket is in a separate node, not that this really matters for the what we need later. If we’re intending to run a guest with 4 virtual CPUs, we can that it will be desirable to lock the guest to physical CPUs 0-3, or 4-7 to avoid non-local memory accesses. If our guest workload required 8 virtual CPUs, since each NUMA node only has 4 physical CPUs, better utilization may be obtained by running a pair of 4 cpu guests & splitting the work between them, rather than using a single 8 cpu guest.

Deciding which NUMA node to run the guest on

Locking a guest to a particular NUMA node is rather pointless if that node does not have sufficient free memory to allocation for local memory allocations. Indeed, it would be very detrimental to utilization. The next step is to ask libvirt what the free memory is on each node, using the virsh freecell command

# virsh freecell 0
0: 2203620 kB

# virsh freecell 1
1: 3354784 kB

If our guest needs to have 3 GB of RAM allocated, then clearly it needs to be run on NUMA node (cell) 1, rather than node 0, sine the latter only has 2.2 GB available.

Locking the guest to a NUMA node or physical CPU set

We have now decided to run the guest on NUMA node 1, and referring back to the capabilities data about NUMA topology, we see this node has physical CPUs 4-7. When creating the guest XML we can now specify this as the CPU mask for the guest. Where the guest virtual CPU count is specified

<vcpus>4</vcpus>

we can now add the mask

<vcpus cpuset='4-7'>4</vcpus>

As mentioned earlier, this works for QEMU, KVM and Xen guests. In the QEMU/KVM case, libvirt will use the sched_setaffinity call at guest startup, while in the Xen case libvirt will instruct XenD to make an equivalent hypercall.

Automatic placement using virt-install

This walkthrough illustrated the concepts in terms of virsh commands. If writing a management application using libvirt, you would of course use the equivalent APIs for looking up this data, virNodeGetInfo, virConnectGetCapabilities and virNodeGetCellsFreeMemory. The virt-install provisioning tool has done exactly this and provides a simple way to automatically apply a ‘best fit’ NUMA policy when installing guests. Quoting its manual page

   --cpuset=CPUSET

   Set which physical cpus the guest can use. "CPUSET" is a comma separated
   list of numbers, which can also be specified in ranges. Example:

     0,2,3,5     : Use processors 0,2,3 and 5
     1-3,5,6-8   : Use processors 1,2,3,5,6,7 and 8

   If the value ’auto’ is passed, virt-install attempts to automatically
   determine an optimal cpu pinning using NUMA data, if available.

So if you have a NUMA machine and use virt-install, simply always add --cpuset=auto whenever provisioning a new guest.

Fine tuning CPU affinity at runtime

The scheme outlined above is focused on the initial guest placement at boot time. There may be times where it becomes necessary to fine-tune the CPU affinity at runtime. libvirt/virsh can cope with this need too, via the vcpuinfo and vcpupin commands. First, the virsh vcpuinfo command gives you the latest data about where each virtual CPU is running. In this example, rhel5xen is a guest on a Fedora KVM host which I used for RHEL5 Xen package maintenance work. It has 4 virtual CPUs and is being allowed to run on any host CPU

# virsh vcpuinfo rhel5xen
VCPU:           0
CPU:            3
State:          running
CPU time:       0.5s
CPU Affinity:   yyyyyyyy

VCPU:           1
CPU:            1
State:          running
CPU Affinity:   yyyyyyyy

VCPU:           2
CPU:            1
State:          running
CPU Affinity:   yyyyyyyy

VCPU:           3
CPU:            2
State:          running
CPU Affinity:   yyyyyyyy

Now lets say the I want to lock each of these virtual CPUs to a separate host CPU in the 2nd NUMA node.

# virsh vcpupin rhel5xen 0 4

# virsh vcpupin rhel5xen 1 5

# virsh vcpupin rhel5xen 2 6

# virsh vcpupin rhel5xen 3 7

The vcpuinfo command can be used again to confirm the placement

# virsh vcpuinfo rhel5xen
VCPU:           0
CPU:            4
State:          running
CPU time:       32.2s
CPU Affinity:   ----y---

VCPU:           1
CPU:            5
State:          running
CPU time:       16.9s
CPU Affinity:   -----y--

VCPU:           2
CPU:            6
State:          running
CPU time:       11.9s
CPU Affinity:   ------y-

VCPU:           3
CPU:            7
State:          running
CPU time:       14.6s
CPU Affinity:   -------y

And just to prove I’m not faking it all, here’s KVM process running on the host and its /proc status

# grep pid /var/run/libvirt/qemu/rhel5xen.xml
<domstatus state='running' pid='4907'>

# grep Cpus_allowed_list /proc/4907/task/*/status
/proc/4907/task/4916/status:Cpus_allowed_list: 4
/proc/4907/task/4917/status:Cpus_allowed_list: 5
/proc/4907/task/4918/status:Cpus_allowed_list: 6
/proc/4907/task/4919/status:Cpus_allowed_list: 7

Future work

The approach outlined above relies on the fact that the kernel will always try to allocate memory from the NUMA node that matches the one the guest CPUs are executing on. While this is sufficient in the simple case, there are some pitfalls along the way. Between the time the guest is started & memory is allocated, RAM from the NUMA node in question may have been used up causing the OS to fallback to allocating from another node. For this reason, if placing guests on NUMA nodes, it is crucial that all guests running on the host have fixed placement, with none allowed to float free. In some wierd and wonderful NUMA topologies (hello Itanium !) there can be NUMA nodes which have only CPUs, and/or only RAM. To cope with these it will be necessary to extend libvirt to allow an explicit memory allocation node to be listed in the guest configuration.

Visualizing libvirt development history using gource and code swarm

Posted: January 16th, 2010 | Filed under: libvirt, Virt Tools | Tags: | 3 Comments »

Michael DeHaan yesterday posted an example using gource to visualize Cobbler development history. Development on Cobbler started in April 2006, making it a similar vintage to libvirt whose development started in November 2005. So I thought it would be interesting to produce a visualization of libvirt development as a comparison.

Head over to the YouTube page for this video if it doesn’t show the option watch in highdef in this embedded viewer. HD makes it much easier to make out the names

Until July last year, libvirt was using CVS for source control. Among a great many disadvantages of CVS is that it does not track author attribution at all, so the first 3 & 1/2 years show an inaccurately small contributor base. Watching the video it is clear when the switch to GIT happened as the number of authors explodes. Even with the inaccuracies from the CVS history, it is clear from the video just how much development of libvirt has been expanding over the past 4 years, particularly with the expansion to cover VirtualBox and VMWare ESX server as hypervisor targets. This video was generated on Fedora 12 using

 # gource -s 0.07 --auto-skip-seconds 0.1 \
          --file-idle-time 500 --disable-progress \
          --output-framerate 25 --highlight-all-users \
          -1280x720 --stop-at-end --output-ppm-stream - \
 | ffmpeg -y -b 15000K -r 17 -f image2pipe -vcodec ppm \
          -i - -vcodec mpeg4 libvirt-2010-01-15-alt.mp4

gource isn’t the only source code visualization application around. Last year a project called code swarm came along too. It has a rather different & simpler physics model to gource, not showing the directory structure explicitly. As a comparison I produced a visualization of libvirt using code_swarm too:

Head over to the YouTube page for this video if it doesn’t show the option to watch in highdef in this embedded viewer. HD makes it much easier to make out the names

In this video the libvirt files are coloured into four groups, source code, test cases, documentation and i18n data (ie translated .po files). Each coloured dot represents a file, and each developer exerts a gravitional pull on files they have modified. For the years in which libvirt used CVS there were just a handful of developers who committed changes.This results in a visualization where developers have largely overlapping spheres of influence on files. In the last 6 months with GIT, changes have correct author attribution, so the visualization spreads out more accurately reflecting who is changing what. In the end, I think I rather prefer gource’s results because it has a less abstract view of the source tree and better illustrates the rate of change over time.

Finally, can anyone recommend a reliable online video hosting service that’s using HTML5 + Ogg Theora yet ? I can easily encode these videos in Ogg Theora, but don’t want to host the 200 MB files on my own webserver since it doesn’t have the bandwidth to cope.

Using GObject Introspection + Gjs to provide a JavaScript plugin engine

Posted: January 10th, 2010 | Filed under: Entangle | 8 Comments »

In writing the Capa photo capture application, one of the things I wanted to support was some form of plugin engine to allow 3rd parties to easily extend its functionality. The core application code itself is designed to have a formal separation of backend and frontend logic. The backend is focused on providing the core object model & operation, typically wrapping external libraries like HAL, libgphoto, lcms in GObject classes, with no use of GTK allowed here. The primary frontend builds on this backend, to produce a GTK based user interface. It is also intended to build another frontend that provides a GIMP plugin.

Back to the question of plugins for the main frontend. If the goal is to allow people to easily write extensions, a plugin engine based on writing C code is not really very desirable. Firefox uses JavaScript for its plugin engine and this has been hugely successful in lowering the bar for contributors. Wouldn’t it be nice if any GTK application could provide a JavaScript plugin engine ? Yes, indeed and thanks to the recent development of GObject introspection this is incredibly easy.

GObject introspection provides a means to query the GObject type system and discover all classes, interfaces, methods, properties, signals, all data types associated with their parameters and any calling conventions. This is an incredibly powerful capability with far reaching implications, the most important being that you will never again have to write a language binding for any GObject based library. There is enough metadata available in the GObject introspection system to provide language bindings in a 100% automated fashion. Notice I said “provide”, rather than “generate” because if targetting a dynamic language (Perl, Python JavaScript) it won’t even be necessary to auto-generate code ahead of time – everything can and will happen at runtime based on the introspection data. Say goodbye to hand written language bindings. Say goodbye to Swig. Say goodbye to any other home grown code generators.

Adding support for introspection

That’s the sales pitch, how about the reality ? The Capa code is based on GObject and was thus ready & willing to be introspected. The first step in adding introspection support is to add some m4 magic to the configure.ac to look for the introspection tools & library. This is simple boilerplate code that will be identical for every application using GObject + autoconf

GOBJECT_INTROSPECTION_REQUIRED=0.6.2
AC_SUBST(GOBJECT_INTROSPECTION_REQUIRED)

AC_ARG_ENABLE([introspection],
        AS_HELP_STRING([--enable-introspection], [enable GObject introspection]),
        [], [enable_introspection=check])

if test "x$enable_introspection" != "xno" ; then
        PKG_CHECK_MODULES([GOBJECT_INTROSPECTION],
                          [gobject-introspection-1.0 >= $GOBJECT_INTROSPECTION_REQUIRED],
                          [enable_introspection=yes],
                          [
                             if test "x$enable_introspection" = "xcheck"; then
                               enable_introspection=no
                             else
                               AC_MSG_ERROR([gobject-introspection is not available])
                             fi
                          ])
        if test "x$enable_introspection" = "xyes" ; then
          AC_DEFINE([WITH_GOBJECT_INTROSPECTION], [1], [enable GObject introspection support])
          AC_SUBST(GOBJECT_INTROSPECTION_CFLAGS)
          AC_SUBST(GOBJECT_INTROSPECTION_LIBS)
          AC_SUBST([G_IR_SCANNER], [$($PKG_CONFIG --variable=g_ir_scanner gobject-introspection-1.0)])
          AC_SUBST([G_IR_COMPILER], [$($PKG_CONFIG --variable=g_ir_compiler gobject-introspection-1.0)])
        fi
fi
AM_CONDITIONAL([WITH_GOBJECT_INTROSPECTION], [test "x$enable_introspection" = "xyes"])

The next step is to add Makefile.am rules to extract the introspection data. This is a two step process, the first step runs g-ir-scanner across all the source code and the actual compiled binary / library to generate a .gir file. This is an XML representation of the introspection data. The second step runs g-ir-compiler to turn the XML data into a machine usable binary format so it can be efficiently accessed. When running g-ir-scanner on a binary, as opposed to a library, it is necessary for that binary to support an extra command line flag called --introspect-dump. I add this code the main.c source file to support that

#if WITH_GOBJECT_INTROSPECTION
    static gchar *introspect = NULL;
#endif

    static const GOptionEntry entries[] = {
        ...snip other options...
#if WITH_GOBJECT_INTROSPECTION
        { "introspect-dump", 'i', 0, G_OPTION_ARG_STRING, &introspect;, "Dump introspection data", NULL },
#endif
        { NULL, 0, 0, 0, NULL, NULL, NULL },
    };

    ...parse command line args...

#if WITH_GOBJECT_INTROSPECTION
    if (introspect) {
        g_irepository_dump(introspect, NULL);
        return 0;
    }
#endif

Back to the Makefile.am rules. g-ir-scanner has quite a few arguments you need to set. The --include args provide the names of introspection metadata files for any libraries depended on. The -I args provide the CPP include paths to the application’s header files. The --pkg args provide the names of any pkg-config files that code builds against. There are a few others too which I won’t cover – they’re all in the man page. The upshot is that the Makefile.am gained rules

if WITH_GOBJECT_INTROSPECTION
Capa-0.1.gir: capa $(G_IR_SCANNER) Makefile.am
        $(G_IR_SCANNER) -v \
                --namespace Capa \
                --nsversion 0.1 \
                --include GObject-2.0 \
                --include Gtk-2.0 \
                --include GPhoto-2.0 \
                --program=$(builddir)/capa \
                --add-include-path=$(srcdir) \
                --add-include-path=$(builddir) \
                --output $@ \
                -I$(srcdir)/backend \
                -I$(srcdir)/frontend \
                --verbose \
                --pkg=glib-2.0 \
                --pkg=gthread-2.0 \
                --pkg=gdk-pixbuf-2.0 \
                --pkg=gobject-2.0 \
                --pkg=gtk+-2.0 \
                --pkg=libgphoto2 \
                --pkg=libglade-2.0 \
                --pkg=hal \
                --pkg=dbus-glib-1 \
                $(libcapa_backend_la_SOURCES:%=$(srcdir)/%) \
                $(libcapa_frontend_la_SOURCES:%=$(srcdir)/%) \
                $(capa_SOURCES:%=$(srcdir)/%)

girdir = $(datadir)/gir-1.0
gir_DATA = Capa-0.1.gir

typelibsdir = $(libdir)/girepository-1.0
typelibs_DATA = Capa-0.1.typelib

%.typelib: %.gir
        g-ir-compiler \
                --includedir=$(srcdir) \
                --includedir=$(builddir) \
                -o $@ $<

CLEANFILES += Capa-0.1.gir $(typelibs_DATA)

endif # WITH_GOBJECT_INTROSPECTION

After making those changes & rebuilding, it is wise to check the .gir file, since the g-ir-scanner doesn't always get everything correct. It may be necessary to provide annotations in the source files to help it out. For example, it got object ownership wrong on some getters, requiring annotations n the return values such as

/**
 * capa_app_get_plugin_manager: Retrieve the plugin manager
 *
 * Returns: (transfer none): the plugin manager
 */

The final step was add rules to the RPM specfile, which are fairly self-explanatory

%define with_introspection 0

%if 0%{?fedora} >= 12
%define with_introspection 1
%endif
%if 0%{?rhel} >= 6
%define with_introspection 1
%endif

%if %{with_introspection}
BuildRequires: gobject-introspection-devel
BuildRequires: gir-repository-devel
%endif


%prep
....
%if %{with_introspection}
%define introspection_arg --enable-introspection
%else
%define introspection_arg --disable-introspection
%endif

%configure %{introspection_arg}

%files
....
%if %{with_introspection}
%{_datadir}/gir-1.0/Capa-0.1.gir
%{_libdir}/girepository-1.0/Capa-0.1.typelib
%endif

That is all. The entire API is now accessible from Perl, JavaScript, Python without ever having written a line of code for those languages. It is also possible to generate a .jar file to make it accessible from Java.

Adding support for a JavaScript plugin engine

Since the API is now accessible from JavaScript, adding a JavaScript plugin engine ought to be easy at this point. There are in fact 2 competing JavaScript engines supporting GObject introspection, Gjs and Seed. Seed looks more advanced, documented & polished, but Gjs was what's in Fedora currently, so I used that. Again the first step was checking for it in configure.ac

AC_ARG_WITH([javascript],
      AS_HELP_STRING([--with-javascript],[enable JavaScript plugins]),
      [], [with_javascript=check])

if test "x$with_javascript" != "xno" ; then
  if test "x$enable_introspection" = "xno" ; then
    if test "x$with_javascript" = "xyes"; then
      AC_MSG_ERROR([gobject-introspection is requird for javascript plugins])
    fi
  fi

  PKG_CHECK_MODULES(GJS, gjs-1.0 >= $GJS_REQUIRED)
  AC_SUBST(GJS_CFLAGS)
  AC_SUBST(GJS_LIBS)

  PKG_CHECK_MODULES(GJS_GI, gjs-gi-1.0 >= $GJS_REQUIRED)
  AC_SUBST(GJS_GI_CFLAGS)
  AC_SUBST(GJS_GI_LIBS)

  with_javascript=yes
  AC_DEFINE([WITH_JAVASCRIPT], [1], [enable JavaScript plugins])
fi
AM_CONDITIONAL([WITH_JAVASCRIPT], [test "x$with_javascript" = "xyes"])

I won't go into any details on the way Capa scans for plugins (it uses $HOME/.local/share/capa/plugins//main.js), merely illustrate how to execute a plugin once it has been located. The important object in the Gjs API is GjsContext, providing the execution context for the javascript code. It is possible to have multiple contexts, so each plugin is independent and potentially able to be sandboxed. The JavaScript file to be invoked is main.js in the plugin's base directory. The first step is to setup the context's search path to point to the plugin base directory:

void runplugin(const gchar *plugindir) {
    const gchar *searchpath[2];
    GjsContext *context;

    searchpath[0] = plugindir;
    searchpath[1] = NULL;

    context = gjs_context_new_with_search_path((gchar **)searchpath);

The context is now ready to execute some javascript code. The Capa plugin system expects the main.js file to contain a method called activate. To start the plugin, we can thus simply evaluate const Main = imports.main; Main.activate();

   const gchar *script = "const Main = imports.main; Main.activate();";

   gjs_context_eval(context,
                     script,
                     -1,
                     "main.js",
                     &status;,
                     NULL);

   if (status !=0) {
     fprintf(stderr, "Loading plugin failed\n");
   }

Presto, you now have a javascript plugin running, having written no JavaScript at any point in the process. There is one slight issue in this though - how does the plugin get access to the application instance ? One way would be to provide a static method in your API to get hold of the application's main object, but I really wanted to pass the object into the plugin's activate method. This is where I hit Gjs's limitations - there appears to be no official API to set any global variable except for ARGV. After much poking around in the Gjs code though I discovered an exported method, which wasn't in the header files

JSContext* gjs_context_get_context(GjsContext *js_context);

And decided to (temporarily) abuse that until a better way could be found. I have an object instance of the CapaApp class which I wanted to pass into the activate method. The first step was to set this in the global namespace of the script being evaluated. Gjs comes with an API for converting a GObject instance into a JSObject instance which the runtime needs. Thus I wrote a simple helper

static void set_global(GjsContext *context,
                       const char *name,
                       GObject *value)
{
    JSContext *jscontext;
    JSObject *jsglobal;
    JSObject *jsvalue;

    jscontext = gjs_context_get_context(context);
    jsglobal = JS_GetGlobalObject(jscontext);
    JS_EnterLocalRootScope(jscontext);
    jsvalue = gjs_object_from_g_object(jscontext, value);
    JS_DefineProperty(jscontext, jsglobal,
                      name, OBJECT_TO_JSVAL(jsvalue),
                      NULL, NULL,
                      JSPROP_READONLY | JSPROP_PERMANENT);
    JS_LeaveLocalRootScope(jscontext);
}

There was one little surprise in this though. The gjs_object_from_g_object method will only succeed if the current Gjs context has the introspection data for that object loaded. So it was necessary to import my application's introspection data by eval'ing const Capa = imports.gi.Capa. That done, it was now possible to pass variables into the plugin. The complete revised plugin loading code looks like

void runplugin(CapaApp *application, const gchar *plugindir) {
    const gchar *script = "const Main = imports.main; Main.activate(app);";
    const gchar *searchpath[2];
    GjsContext *context;

    searchpath[0] = plugindir;
    searchpath[1] = NULL;

    context = gjs_context_new_with_search_path((gchar **)searchpath);

    gjs_context_eval(context,
                     "const Capa = imports.gi.Capa",
                     -1,
                     "dummy.js",
                     &status;,
                     NULL);

    set_global(context, plugin, "app", application);

    gjs_context_eval(context,
                     script,
                     -1,
                     "main.js",
                     &status;,
                     NULL);

    if (status !=0) {
      fprintf(stderr, "Loading plugin failed\n");
    }

This code is slightly simplified, omitting error handling, for purposes of this blog post, but the real thing is not much harder. Looking at the code again, there is really very little (if anything) about the code which is specific to my application. It would be quite easy to pull out the code which finds & loads plugins into a library (eg "libgplugin"). This would make it possible for any existing GTK applications to be retrofitted with support plugins simply by generating introspection data for their internal APIs, and then instantiating a "PluginManager" object instance.

In summary, GObject Introspection is an incredibly compelling addition to GLib. With a mere handful of additions to configure.ac and Makefile.am, it completely solves "language bindings" problem for you. I'd go as far as to say that this is a single most compelling reason to write any new C libraries using GLib/GObject. Furthermore if there are existing C libraries not using GObject, then provide a GObject wrapper for them as a top priority. Don't ever write or auto-generate a language binding again. Writing GTK applications either entirely in JavaScript, or in a mix of C + JavaScript plugins is also a really nice development, avoiding the issue of "clashing runtime environments" seen when using Python + GTK. The Gjs/Seed/GObject developers deserve warm praise for these great enhancements.

Following gphoto SVN development with GIT

Posted: January 10th, 2010 | Filed under: Entangle | No Comments »

Since I started developing the Capa photo capture application, I’ve been following development of gphoto much more closely. Unfortunately gphoto is using subversion for source control. There are many things wrong with subversion in comparison to modern SCM systems like Mercurial or GIT. In this particular case though, the main problem is speed, or lack thereof. gphoto uses sourceforge as its hosting service and sf.net subversion servers are slower than you can possibly imagine. As an example, run ‘svn log’ to browse changes and you’ll be waiting 30 seconds for it to even start to give you an answer. Then run ‘svn diff’ to look at the contents of a change and you’ll be waiting another 30 seconds or more. Totally unacceptable. Once you’ve used a distributed SCM system like Mercurial or GIT, you cease to have tolerance for any operations which take longer than a 2-3 seconds.

Fortunately, GIT has the ability to checkout directly from SVN repository. The gphoto SVN repository actually contains many separate sub-projects in it and I didn’t want to import them all to my local GIT repository. This meant I couldn’t make use of the branch / tag tracking support directly and had todo things the long way. The good news is that the long way has already been blogged about and it isn’t hard.

There were two projects I was interested in getting, libgphoto (the main library) & gphoto (the command line frontend) and I wanted each to end up in their own GIT repository. For both, I wanted the trunk and 2.4.x branch. Starting with gphoto, since it has much less history, the first step was to clone the trunk

# git svn clone https://gphoto.svn.sourceforge.net/svnroot/gphoto/trunk/gphoto2 gphoto2

This takes a fairly long time because it pulls down every single SVN changeset in the repository. Once that’s complete though, the .git/config contains

[svn-remote "svn"]
        url = https://gphoto.svn.sourceforge.net/svnroot/gphoto/trunk/gphoto2
        fetch = :refs/remotes/git-svn

And the local ‘master’ branch is connected to the ‘git-svn’ remote.

$ git branch -a
* master
  remotes/git-svn

Anytime further changes are made in the SVN repository, those can be pulled down to the local GIT repository using git svn fetch git-svn. At this point it is possible to add in the branches. Simply edit the .git/config file and add another ‘svn-remote’ entry, this time pointing at the branch path.

[svn-remote "svn24"]
        url = https://gphoto.svn.sourceforge.net/svnroot/gphoto/branches/libgphoto2-2_4/gphoto2
        fetch = :refs/remotes/git-svn-2.4

And then pull down all the changes for that branch, and create a local branch for this

# git svn fetch svn24
# git checkout -b v2.4 git-svn-2.4

This leaves a local branch ‘v2.4’ and a remote branch ‘git-svn-2.4’

$ git branch -a
  master
* v2.4
  remotes/git-svn
  remotes/git-svn-2.4

That takes care of the gphoto2 frontend command line app codebase. It is then a simply matter to repeat the same thing substituting libgphoto2 into the SVN paths to checkout the library codebase. Though this takes a little longer because it has much much more history. This little upfront pain to clone the SVN repo to GIT will be paid back many hundreds of times over thanks to the speed that GIT brings to SCM operation.

The moral of the story is simple: Don’t ever choose subversion. If you have the choice, use GIT. If you don’t have the choice, then mirror SVN to GIT anyway.

Edit: One thing I forgot to mention is that after setting up all branches, run a git gc on the repo. This will dramatically reduce the disk usage & speed up GIT operations further

$ du -h -c -s .
45M .
45M total
$ git gc
Counting objects: 3695, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3663/3663), done.
Writing objects: 100% (3695/3695), done.
Total 3695 (delta 3081), reused 0 (delta 0)
$ du -h -c -s .
5.0M .
5.0M total

Going from 45 MB to 5 MB is quite impressive !