Buliding Portable C/C++ Programs and Libraries for the Linux World

I have to admit that pulling source code from github has become the mainstream development mode, and virtualization has become the main stream deployment mode, but many times it is still desirable to have a piece of software delivered in the form of a binary executable or library. There are many reasons one wants to do that: one might not want to give away the source code; they client wants the source code, but doesn’t have the capability of building it, or simply does not want to spent the human labor to build it. With tools like maven, software versioning is kind of under control in the Java world. But the C/C++ world is not as lucky, and with all the github code that usually do not even have a version number, build a C/C++ code repository with dependencies is not for everybody.

Virtualization helps to contain all the dependencies, and it is a good solution for bigger software components like web service. One just deliver a VM image and everything is taken care of. But the use cases for C/C++ are usually small performance-critical components that must be tightly integrated to code written in other languages, and the overhead of virtualization is usually too high.

But the Linux world is notoriously heterogeneous. We are living in a world with Linux kernel 3.x, Ubuntu 14.x and RHEL 7.x, but almost all the companies and university labs whose machines I got a chance to log into are still using CentOS 5.x for production and research (RHEL 5 was first released in 2007) — once you got a cluster setup, it’s virtually impossible to upgrade the operating system version. On the other hand, you also want your software to run on the newest systems which are available to today’s startup companies and everything in between and hopefully in future.

Now generic portability between Linux versions and distributions cannot be achieved by C/C++. That’s why Java was invented. But if one just want to deliver a single program/library file that contains all the functionality — thanks to the backward compatibility of the Linux kernel — this is usually achievable by linking almost all the libraries statically into the program.

The library case is more interesting. We want everything to be contained in the library, including all the libraries we depend on. But we cannot provide a static library because in that way we’ll have to expose all the dependencies and it will cause version conflicts for sure when the client tries to link against the library. So the solution is to develop a (almost) statically linked shared library, and maybe a very small piece of interfacing code. The KGraph library for similarity search is provided in this form.

Static linking is not the common practice in the linux world. All software packages are distributed with shared libraries, and if one chooses to build something from source code, shared libraries are produced by default. But fortunately, most software packages use the automake system, and static linking has always been an option which can be easily enabled by adding “–enable-static –disable-shared” in the configure script. The “–disable-shared” part is important because without it, shared library will also be produced, and the default behavior of gcc is to link against shared library. One can force gcc to link statically by adding “-static”, but some system API wont work as expected (getaddrinfo will lose the ability to resolve hostnames, update: solved with c-ares). Now with “–enable-static” alone, the build system will assume the library is to be used statically, and will produce non-relocatable machine code, and we won’t be able to use that to produce a shared library. The solution is to export “CFLAGS=-fPIC” and “CXXFLAGS=-fPIC” before running the configure script. These two easy fixes work in most packages, and the remaining have to be worked on cases by case.

It will be misleading to end this blog leaving a novice reader believing static linking is the way to go for everybody. Actually there are strong arguments against static linking (basically one can gain more with dynamic linking). But there are languages like golang which favor static linking. And for people out there by himself, like me, who does not have a lot of human labor at disposal, and would rather spending time on algorithms rather than software packaging, static linking does come in handy.

About the companion box:

This box contains a development environment that is geared towards computation intensive data processing applications without GUI, like machine learning, image/audio processing and such. It is based on CentOS 5.6 with devtools 2.1 (gcc-4.8). I’ve also installed many libraries using the above method, including Boost, Poco, OpenCV, libav and many others. OpenCV and libav have been taylored to removed GUI and device related stuff (including media playback), because such functionality relies on components that are hard to make portable.