Kernel Log: Coming in 2.6.38 (Part 5) - Architecture, infrastructure and virtualisation
by Thorsten Leemhuis
In certain situations, a small patch for 2.6.38 perceivably improves the response time of desktop environments. Transparent Huge Pages, on the other hand, simplify memory management, which is particularly relevant in terms of virtualisation and server software.
On Tuesday last week, Linus Torvalds issued the seventh release candidate of Linux 2.6.38. Among other things, Torvalds mentioned a fix for a flaw in Btrfs that can potentially cause data loss in certain situations – again, there was no indication of a final release date for 2.6.38.
Nevertheless, the next version in the main development branch is approaching completion, and the Kernel Log will, therefore, gradually conclude its "Coming in 2.6.38" mini series; the penultimate part will discuss the advancements in terms of the kernel's architecture code, infrastructure and virtualisation. Part 1 and part 3 discussed the changes to the graphics hardware and network communication code; part 2 and part 4 revolved around file systems and storage. This week, the series will be concluded with an article on the kernel's audio, USB and video hardware drivers.
At the end of last year and the beginning of this year, a modification that became known as the "wonder patch" or "auto-group scheduling feature" attracted a relatively large amount of attention because it improves the interactivity of desktop environments and applications in certain situations. Instead of magically generating new CPU resources, however, the patch automatically groups a session's processes in a cgroup (control group) – which causes the available processor time to be redistributed when the CPU is working to capacity, and potentially makes the desktop more responsive if the CPU-heavy processes aren't in the same group as the desktop environment.
Details about this functionality are available in a previous Kernel Log and in an article on LWN.net. Scheduler maintainers Ingo Molnar and Linus Torvalds seemed quite taken by the patch and specially mentioned it in the main Git-Pull request for the Scheduler code as well as the announcement of 2.6.38-rc1. Of course, the modification produces the most discernible results in environmental conditions that are particularly familiar to kernel hackers: Using the desktop interface while a terminal such as Xterm performs a CPU-heavy task – such as compiling a kernel. The auto-group feature has no significant effect in situations where two desktop applications are competing for processor time.
Enabled via "CONFIG_SCHED_AUTOGROUP", the function is disabled in the kernel's default configuration settings; some of the distributions may not even enable it because Systemd , the alternative to SysV-Init and Upstart, offers a similar, but more flexible, option.
The new Transparent Huge Pages (THP) allow the kernel to be more flexible when handling large memory pages; in some situations, they even enable the kernel to use them automatically (for instance 1, 2, 3); in the long term, this should make the complicated and rather inflexible integration via HugeTblFs redundant. Transparent Huge Pages enhance performance especially with large databases and in virtualisation because they simplify the kernel's and processor's memory management, and because they allow various functional components of modern CPUs to be utilised that would otherwise remain idle.
THP was mainly promoted by Red Hat developer Andrea Arcangeli; when introducing Red Hat Enterprise Linux 6 last November, Red Hat had already used earlier versions of the code and highlighted THP as one of the most important improvements. Details about the technology and its advantages are available in the relevant documentation and in the article "Transparent huge pages in 2.6.38" on LWN.net. Several benchmarks run by VM hacker Mel Gorman just before the inclusion of the THP patches compare THP with HugeTblFs and a kernel that doesn't use any large pages. Arcangeli provides background information and a description of the technology's advantages in the slides and video recording of a presentation he gave at the KVM Forum 2010.
THP found its way into 2.6.38 via the MM tree maintained by Andrew Morton, bypassing the Linux-Next development branch in which the subsystem maintainers coordinate their work for the next version to follow the one they are currently developing. This caused some friction because several bugs (which were fixed at very short notice) only came to light after THP had been added to the main development branch; in his response, Morton said that he is already planning to prepare the MM kernel for integration into Linux-Next, which has already replaced the previously very important MM kernel in many areas.
After integrating Xen Dom0 support in 2.6.37, the developers have now added some basic support for Xen backend drivers to the kernel – a prerequisite for running a meaningful Linux Dom0, the backend drivers couldn't be completed in time and will probably follow in 2.6.39. The "userspace grant access device driver" now allows Xen guests to share memory pages with others. The kernel's HVM X2APIC support is said to speed up interrupt handling.
The KVM developers wanted to add "asynchronous page faults" to their hypervisor but Linus Torvalds didn't like some aspects of their patches, so the developers decided to make further modifications. Consequently, the remaining major KVM advancements are the support of Transparent Huge Pages and the "flush-by-asid" virtualisation extensions for AMD's Bulldozer processor family, which is expected in mid-2011.