Kernel Log: Coming in 3.5 (Part 3) - Architecture
by Thorsten Leemhuis
With the help of uprobes, performance monitoring tools can now monitor userspace software. The ongoing overhaul of the ARM code is showing tangible success.
After years of development, dozens of aborted attempts and some major restructuring, the code for uprobes (userspace probes) has finally been merged into the kernel (1, 2, 3, 4, 5 and others). This enables the kernel to insert breakpoints into code for userspace software at runtime, though currently it can only be used via the kernel's perf events subsystem. It enables tracing software that makes use of it – such as kernel component perf and version 1.8 of systemtap – to add tracepoints to userspace software, and thus monitor the runtime behaviour of the kernel and programs at the same time.
Until now, perf could only observe processes within the kernel, while systemtap needed utrace, which never found its way into the official kernel, to monitor userspace software. Uprobes is itself derived from utrace, though they parted ways long ago. Instructions for using uprobes in practice can be found in the uprobes merge commit. Background information can be found in an article on LWN.net, which explains that the kernel, despite the new analytical options, is not even close to offering the functionality required to create a tracing solution comparable to Solaris' dtrace. This would require various extensions to code in and around uprobes, some of which are currently on the drawing board but not yet in development.
Current state of development
On Saturday night Linus Torvalds released the sixth pre-release of Linux 3.5. In the announce mail he mentioned that there have been noticeably fewer commits in this RC than in -rc5, and said, "I think we're getting closer to a final release."
Some files for the "trace-cmd" ftrace tool have made their way into the kernel's tools directory. They lay the groundwork for a libtraceevents library – this is intended to form a common foundation for tracing tools, including kernel tool perf, which already uses this library in Linux 3.5. As Ingo Molnar notes in his main git pull request for the perf subsystem, userland programs such as powertop should also be able to use this library. The pull request also mentions a number of other changes in this area, such as improvements in the way assembler reports generated using perf-report are displayed; the kernel now supports precise event sampling using Instruction-Based Sampling (IBS), which is offered in recent AMD processor cores for performance monitoring.
The fundamental overhaul of the ARM code which has been in progress for more than a year has continued apace in the 3.5 kernel. Some of the more significant changes are discussed in the two threads with the main git pull request for Linux 3.5 (1, 2). In one of the mails, Olof Johansson proudly notes that code for more ARM platforms is being adapted to obtain the hardware information required for booting from device trees. This work is starting to pay off, with, according to Johansson, some ARM platforms now getting Linux support simply by adding device tree files, which are completely independent of the kernel source code. Background information on the ARM overhaul and efforts to create a kernel binary able to boot on many different ARM platforms can be found in the recent LWN.net article "LinuxCon Japan: One zImage to rule them all".
- A collection of patches cleans up the x86 code somewhat and causes the exception table to be sorted at kernel build-time, which is supposed to slightly speed up booting (1, 2).
- Major changes for KVM include a new API for injecting Message Signalled Interrupt (MSI) messages into guest systems.
- Important changes to the Xen code include support for perf.
- An overhaul of the process scheduler should improve its behaviour in NUMA systems. At the same time, several outdated functions for energy-saving process scheduling, which, according to the developer, did more harm than good, have been removed.
- Documentation added to the kernel explains the use of the EFI boot stub, which allows EFI firmware to start the kernel directly. This function was merged into the kernel in Linux 3.3 and is used by recently released bootloader gummiboot.
- Major changes to the error detection and correction (EDAC) code means that, in future, the kernel will give more precise information on where a memory error has occurred on modern processors.
- The new "skew_tick" kernel parameter allows users to control the way the timer ticks, which can be useful for jitter reduction on HPC and RT workloads, but which increases power consumption.
- Support for the ancient Micro Channel Architecture (MCA), which has been obsolete for at least a decade, has finally been removed. Support for ARMv3 and the ixp23xx and ixp2000 (intel XScale) platforms has also been removed.