Building Trust in Linux Distributions
Opinion: When running a Linux system, a user relies on the creator of the Linux distribution to provide them with a stable, fast, secure and bug-free experience. But given the experience of recent weeks, it may be worth considering how that user makes sure that's what they get.
Distributions are a link in the Linux chain which users tend to place a lot of trust in. The reasonable assumptions are that a Linux distribution creator will keep the packages within the distribution up to date and secure, that the packages have been carefully selected and tested, and in the case of commercial Linux distributions, that bugs will be tracked and handled. But what happens when these assumptions aren't fulfilled?
Perl developers who use Red Hat Enterprise Linux recently got a shock when they benchmarked the Perl version supplied with Red Hat. Vipul Ved Prakash was involved in scaling up at a new startup and had just deployed a new 150-core system for their processing needs. But the expected performance boost did not appear. Vipul found that a Perl command named "bless" was running slowly, very slowly. The "bless" command is a pretty fundamental command and the overall impact on reading files made the Red Hat Perl one hundred time slower. This was not a new bug, but one that had been reported to Red Hat in 2007. The solution was simple; get the Perl source and build his own version of Perl. After installing it, Vipul reported performance improvements speed ups of well over one hundred times on their systems.
In some ways, this shows the strength of open source – Got a problem? Go to the latest source, compile it, see if the bug is fixed, if it is then you just fixed your own system. But it also shows the problem of assuming that the distribution you are using, even if it is current and patched, is correct. If you've paid for support for your Linux distribution, you need to know that the problem is going to be fixed in a timely fashion.
Back with the Red Hat Perl problem, the issue became more convoluted as people analysed what had happened. According to Nicholas Clark, the problem stems from Red Hat seeming "to have an aggressive policy of incorporating pre-release changes in their released production code". This wouldn't be a problem if Red Hat had good communications with the Perl development team, but according to Clark, they don't, so while the original problem was fixed in the main Perl line before release, Red Hat weren't involved in reporting it and never integrated the fix. Clark also advises to build from source and includes another reason; most distribution packagers compile Perl with support for ithreads compiled in. This isn't a Perl default, and if your Perl code does not use ithreads, having the ithreads support compiled in can result in a 10 per cent slow down, according to Clark.
This isn't only a Red Hat issue, it is one that covers all Linux distributions. I've lost count of the number of times I've heard someone say "Oh yes, the version that is shipped with [insert distribution name] is broken, just build it yourself." The problem is that decisions like whether a patch is integrated or what flags should be used to compile packages are made by the distribution packager, and often those policy decisions are not as transparent as they should be. Customers of distributors have a right to more information, Open source software should be built and packaged by distributors in as open a way as it is developed. If a distributor is actually distributing a variant of a program out of step, however slightly, with the main line source of that program, they should say up front that is what they do and when they do it. Distributors should be expected to engage, directly with the maintainers of any packages they include and ensure they they track the released versions as their primary package. Where a distributor needs to maintain their own version of the code for backward compatibility issues with existing customers, it is that version should be separate from the distribution.
Now a distributor isn't going to be able to list every package and how they work with the developers of that package. Even if they did, the resulting document would be a dense impenetrable mass. What is needed is a simple statement – a packagers manifesto – which states simply the principles behind how they package applications, to set a standard for them to be held to. Something like,
- We will track the main line releases of packages we incorporate
- We will communicate bugs found by our users to package maintainers and cooperate with them to resolve issues
- We will make available references to how our packages are built
- We will always detail where our distributions differ from mainstream packages
With that kind of promise from distributors, Linux distributions can avoid the micro-fragmentation that can occur and maybe quell that feeling from some quarters that the only open source you can really depend on is the open source code you have compiled yourself.