Kernel Log: Coming in 3.3 (Part 3) – Architecture and infrastructure
by Thorsten Leemhuis
A long line-up of changes to the code for memory management will fix problems and improve performance in some areas. The kernel now supports the Large Physical Address Extension (LPAE) and boots on x86 EFI systems even without a boot loader.
A number of changes in Linux 3.3 are intended to take care of various problems that can, in certain situations, cause systems to temporarily stall when the kernel is writing large amounts of data to a slow disk (such as a USB flash drive). Linux 3.2 already fixed some of these types of issues where the writeback code was to blame – now the changes are targeting the memory compaction code and fixing problems related to transparent hugepages (THP).
In November, LWN.net detailed some of the history with its article on "Huge pages, slow drives, and long delays"; developers managed to work around the deactivation of "synchronous compaction" mentioned there with an approach developed after the article was published. Details and test results related to the effects are provided in an almost four-hundred-line long comment on a commit that only removes three lines but is just one of many changes to the compaction code (1, 2, 3, 4, 5, 6, and others).
A further change to the memory management code will help prevent another source of problems that also came up while writing to slow data volumes and is related to the writeback infrastructure. With an NTFS-3G Fuse driver, it can also improve writing performance by about fifty per cent in some special circumstances, as test results in the commit message show.
Current state of development
Over the weekend Linus Torvalds released the sixth pre-release of Linux 3.3. There he wrote "it really is all small fixes and cleanups. In fact, it's been calm enough that this might be the last -rc, but we'll see how the upcoming week goes. If it stays calm (and hopefully even calms down some more), there doesn't seem to be any major reason to drag out the release cycle any more." So it's quite possible that Linux 3.3 will be released this weekend or maybe one week later, if Torvalds decides to do one more RC.
Kernel developers have included some changes in Linux 3.3 categorised as "memory control group naturalisation patches" which should significantly reduce management for the cgroup controller "memory" (1, 2, 3, and others). This will reduce the overhead of the controller to regulate RAM use; some distributions have enabled this by default. In May 2011, LWN.net summarised some of the background information for an earlier development version in "Integrating memory control groups".
In Linux 3.3, the control group's memory controller offers basic functions for limiting the amount of RAM that the kernel can allocate to itself for managing and running processes (1, 2); the TCP buffer size controller mentioned in the first part of the Kernel Log series uses these functions to limit the amount of RAM that the buffers used for TCP communication are allowed to consume. Another change fixes a performance problem in the readahead code that showed up when a program asked for large units from a fast data volume (such as an SSD). This was caused by an undesired side effect of changes in Linux 2.6.39; LWN.net provides further information in the article "What happened to disk performance in 2.6.39".
Among other changes, the KVM code can now provide a virtual performance monitoring unit (PMU) that guest systems can use for performance analyses and other tracing tasks (1, 2, and others). Discard support in the Xen code now includes an optional feature for permanently removing data in a freed-up areas of storage devices ("discard support with secure erasing support"); a git-pull request from Konrad Rzeszutek Wilk, who works for Oracle, mentions a few other changes specific to Xen. The balloon, blk, console and net Virtio drivers now include everything needed for ACPI S4, allowing hibernate for guests using these data exchange interfaces.