Advanced integration testing with Pax Exam Karaf

From my first steps in programming, I realized the need of integration testing.

I remember setting up a macro recorder in order to automate the tests of my first swing applications. The result was a disaster, but I think the idea was in the right direction even though the mean was wrong.

Since then I experimented with a lot of tools for unit testing, which I think that are awesome, but I always felt that they were not enough for testing my "actual product".

The last couple of years I have been working with integration projects and also with OSGi and there the challenge of testing the "actual product" is bigger. The scenarios that need to be tested sometimes involve more than one VMs and of course testing OSGi is always a challenge.

In this post I am going to blog about Pax Exam Karaf, which is an integration testing framework for Apache Karaf and also the answer to all my prayers. Kudos to Andreas Pieber for creating it.

Tools for integration tests in OSGi
I have been using Pax Exam for a while now for integration tests in a lot of project I have been working on (e.g. Apache Camel). Even though its a great tool it does not provide the "feel" that the tests run inside Apache Karaf but rather inside the underlying OSGi container.

A tool which is not a testing framework itself, but is frequently used for OSGi testing is PojoSR. It actually allows you to use the OSGi service layer without using the module layer (if you prefer its an OSGi service registry that run without an OSGi container). So its sometimes used for testing OSGi services etc. A great example is the work of Guillaume Nodet (colleague of mine at  FuseSource and yes I know that he doesn't need introduction) for testing Apache Camel routes that use the OSGi blueprint based on PojoSR. You can read more about it in Guillaume's post. Very powerful but yet it is not inside Karaf (it tests the routes in blueprint, but it doesn't test OSGi stuff).

Arqullian is an other effort that among others allow you to run OSGi integration tests but I haven't used that myself.

Pax Exam Karaf
Each of the tools mentioned in the previous section are great and they all have their uses. However, none of the above gives me the "feel" that the tests do run inside Apache Karaf.

Pax Exam Karaf is a Pax Exam based framework which allows you to run your tests inside Apache Karaf. To be more accurate it allows you to run your tests inside any karaf based container. So it can also be Apache ServiceMix, FuseESB or even a custom runtime that you have created yourself on top of Apache Karaf.

As of Karaf 3.0.0 (it's not yet released) it will be shipped as part of Karaf tooling. Till then you can enhoy it at the paxexam-karaf github repository.

The benefits it offers over traditional approaches is that it allows you to use all Karaf goodness inside your tests:
  • Features Concept
  • Deployers
  • Admin Service
  • Dynamic configuration
  • more
Setting it up
Detailed installation instructions can be found at the paxexam-karaf wiki.

The basic idea is that your unit test is added inside a bundle called probe and this bundle is deployed inside the container. The container can be customized, with custom configuration, custom features installation etc so that it fits the "actual" environment.

Starting a custom container
To setup the container of your choiece you can just modify the configuration method. Here is an example that uses Apache ServiceMix as a target container:

Using the Karaf shell
Since one of Karaf's awsome features is its shell, being able to use shell commands inside an integration test is really important.

The first thing that needs to be addressed in order to use the shell inside the probe, is that the probe will import all the required packages. The probe by default uses a dynamic import on all packages, however this will exclude all packages exported with provisional status. To understand what this means you can read more about the provisional osgi api policy. In our case we need to customize our probe and allow it to import such packages this can be done by adding the following method to our test:

Now in order to execute commands you need to get the Command processor from the OSGi service registry and use it to create a session. Below is a method that allows to do in order to execute multiple commands under the same session (this is useful when using stateful commands like config). Moreover this method allows you to set a timeout on the command execution.

Distributed integration tests in Karaf
Working full time on open source integration, my needs often spawn across the boundaries of single JVM. A simplistic example is having one machine sending http requests to an other, having multiple machines exchanging messages through a message broker, or even having gird. The question is how do you test these cases automatically.

Karaf provides the admin service, which allows you to create and start instances of Karaf that run inside a separate jvm. Using it from inside the integration test you are able to start multiple instances of Karaf and deploy to each instances the features/bundles that are required for your testing scenario.

Test Scenario:
Let's assume that you want test a scenario where you have on jvm that acts as a message broker, one jvm that acts as a camel message producer and one jvm that acts as a camel message consumer. Also let's assume that you don't run vanilla karaf, but FuseESB (which is an Enterprise version of ServiceMix, which is powered by Karaf). You can use the admin service from inside your integration test just like this.

You might wonder now how are you supposed to assert that the consumer actually consumed the messages that it was supposed to. There are a lot of things that could be done here:

  • Connect to consumer node and use camel:route-info
  • Connect to the consumer and check the logs (if its supposed to log something)
  • Get the jmx stats of the broker

You can literally do whatever you like.

So here's how it would like if we go with using the camel commands:

Pointers to working examples
I usually provide sources for the things I blog. In this case I'll prefer to point you to existing OSS projects that I have been working on and are using paxeaxam-karaf for integration testing.

Fuse Fabric integration tests: Contains test that check the installation of features, availability of services, distributed OSGi services, even the creation and use of agents in the cloud.

Jclouds-Karaf integration tests: Contains simple tests that verifies that features are installed correctly.

Apache Karaf Cellar integration tests: Contains distributed tests, with quite a few Hazelcast examples.


Installing services in the cloud with jclouds

I spent the whole today stuck on an issue with jclouds and I thought that it would be a good idea to blog about it so that others don't have to spend so many hours on it.

The issue
My target was to write an integration test, that will start a node on Amazon EC2 and install a service that would be used for the integration test. So I created a script that performed a curl to download the tarball of the service unpack the service and run the service. So far so good. The problem I encountered was that my invocations on the method runScriptOnNode (a jclouds method for invoking scripts on remote nodes) timed out after waiting for 10 minutes. However, the script only needed 1 minute and was successfully executed.

Diving into jclouds run script methods
After spending some time to make sure that no network issues, like firewalls and such where involved, I decided to examine in depth how the runScriptOnNode method works.

Jclouds uses an initialization scripts, which installs the target script to the node and invokes it. The initialization script keeps track of the targets script pid and is able to tell if the target script has completed its execution. So the runScirptOnNode will block for as long as the initialization scripts replies that the target script is running.

Where's the catch?
The initialization script keeps track of the target scripts PID by executing findPid which is ps and grep using a pattern which matches the execution path. That's not a problem by itself, but if you install your service and run it inside the same folder, then initialization script will get confused and won't be able to tell when if the target script finished its execution. As a result the runScirptOnNode method will block till it times out.

The figure above displays a setup that can have problems. In this setup the init script will query the status of the target script by performing a ps and using the jc1234 to filter out processes. However, if a new process is started under that folder (by the target script), say folder service, then the init script will not be able to properly detect when target script finished. That's because the findPid will now return the pid of the service.

Lessons learned
Never start a service inside the same folder where the target script is executed, make sure you unpack and run your service from inside an other folder. Even better use a framework for installing the service (e.g. Apache Whirr) for installing mainstream service and only put your fingers on it if you really have to.