Kernel Log - Coming in 2.6.31 â Part 4: Tracing, architecture, virtualisation
by Thorsten Leemhuis
New performance counters allow developers to take a detailed look at the runtime behaviour of program code to target specific areas for optimisation. The recently introduced tracing infrastructure has been further modified and improved. Other changes affect the architecture, the memory subsystem, and various virtualisation solutions.
Last weekend, Linus Torvalds released the seventh release candidate of Linux 2.6.31. In his release email, Torvalds highlights some of the corrections; two days earlier, the list of known new problems contained 29Â unsolved problems.
Unlike the week before, Torvalds didn't make a statement about the final release of 2.6.31 â it is likely to take another week or two. The Kernel Log continues its discussion of the major new features of 2.6.31 by looking at the areas of tracing, architecture, memory management and virtualisation.
Detailed road test
Operating partly in the kernel and partly in the userspace, the "performance counters" code can be used for retrieving the performance data offered by modern processors. This performance data quantifies various CPU processes that have an impact on CPU performance â providing an in-depth analysis and allowing developers to optimise the code segments relevant to processing speed down to the last detail for each CPU.
Occasionally abbreviated to "perf_counter", performance counters should not be confused with Perfmon ("hardware-based performance monitoring interface for Linux"), which has offered similar functionality for years â the kernel hackers didn't include it in the Linux source code because they were dissatisfied with some of Perfmon's properties. This is probably one of the reasons why Ingo Molnar and several other developers pursued the idea of performance counters, which use a slightly different approach. The chief developer of Perfmon repeatedly criticised some aspects of the performance counter concept when the counters were first introduced in late 2008 and during the months that followed; when the developers began to integrate the required more than 600Â patches into the kernel sources, he added further criticism which was discussed in greatest detail. The article "Duelling performance monitors" published on LWN.net in late 2008 provides some details concerning this competitive situation.
Use of the performance counters is described in the more recent article "Perfcounters added to the mainline" on LWN.net. Further background information can also be found in the documentation. The relevant userspace tools were integrated directly into the kernel sources in a newly established tools/ subdirectory after a lengthy debate, during which Linus Torvalds expressed his opinion in no uncertain terms.
For other articles on 2.6.31 and links to the rest of the "Coming in 2.6.31" series, see The H's Kernel Log - 2.6.31 Tracking page.
A different kind of analysis
Ingo Molnar has introduced not only the performance counters, but also numerous other changes affecting the tracing area he maintains. In his git-pull request, he suggests that the main development phase of trace points is likely to be completed in the near future. He also highlights improved filters, internal restructuring measures and performance optimisation. Some of the possibilities offered by the tracing infrastructures of current kernels are explained in Sony developer Tim Bird's presentation "Measuring Function Duration with Ftrace", which we already mentioned in the article "The nitty gritty â The proceedings of the Linux Symposium 2009 ....".
The kernel hackers also integrated a profiling feature that uses the GCC Coverage Testing Tool (gcov). Kernel developers and testers can from now on use Kmemleak (documentation) to track down memory leaks â however, the tool misinterprets certain situations and constructions, and the results it produces shouldn't be trusted implicitly. Another new tool, Kmemcheck (documentation), detects the use of non-initialised memory areas.
In brief
The changes described above are only the most significant of those recently made by kernel hackers in terms of the Linux architecture and infrastructure. Here is a short overview of further changes:
Architecture code:
- From 2.6.31, Linux will address 2^46 instead of 2^44Â bytes of RAM on x86-64 CPUs â however, this currently affects very few people, as systems offering 16 to 64Â Tbytes of RAM are still rather rare.
- That hibernation (suspend-to-disk/software suspend/swsups) and power-saving techniques are important not just for notebooks, but also for mainframes, is demonstrated by several of Hans-Joachim Picht's patches to provide hibernation support for IBM's S390 architecture (now System z) being incorporated into the main development branch. Additional changes introduced by Michael Holzheu further improve the use of several power-saving techniques for S390 systems.
Memory management:
- A number of patches submitted by Mel Gorman are designed to optimise the page allocator responsible for memory allocation â a few benchmark results can be found in the commit comment.
Virtualisation:
- The KVM developers have reworked the interrupt code and reportedly improved SMP performance.
- The Xen code in the kernel now also provides some details of the /sys/hypervisor directory. A new addition to the kernel is the evtchn Xen driver for sending and receiving events on "event channels". The integration of Dom0 support, on the other hand, seems to have slipped away into the far future again â there has been no further discussion of the topic on LKML since the discussions just before and after the release of Linux 2.6.30.
- After things went quiet around the Lguest "Simple x86 Hypervisor", this time around developers have made several further substantial changes; additions include PAE support and improved multi-threading.
Minor gems
Many additional minor, but by no means insignificant, changes can be found in the list below. Like many of the references in the text above, the links lead to the relevant commits in the web front end of the main Linux development branch, where the commit comments and the patches themselves provide extensive further information on the respective changes.
Architecture Support
...ARM:
- Add core support for ARMv6/v7 big-endian
- ARM: 5522/1: PalmLD: IDE support
- ARM: add support for the EET board, based on the i.MX31 pcm037 module
- ARM: GTA02/FreeRunner: Add machine definition
- ARM: Kirkwood: add Marvell 88F6281 GTW GE board support
- ARM: MINI2440: Add machine support
- ARM: MX35: Add PCM043 board support
- ARM: OMAP3: Add omap3 EVM defconfig
- ARM: OMAP3: Add omap3 EVM support
- ARM: OMAP3: Add support for OMAP3 Zoom2 board
- ARM: OMAP4: Add defconfig for 4430 SDP
- ARM: OMAP4: Add support for 4430 SDP
- ARM: pxa: add basic support for HP iPAQ hx4700 PDAs
- ARM: pxa/em-x270: add ability to control GPS and GPRS power
- ARM: pxa/hx4700: add Maxim 1587A voltage regulator
- ARM: pxa/mioa701: add Maxim 1586 voltage regulator
- ARM: pxa/mioa701: add V3 gain configuration for Maxim 1586 voltage regulator
- ARM: pxa/palm: Add Palm27x aSoC driver to PalmTE2
- ARM: pxa: Stargate 2 board support
- ARM: pxa/treo680: initial support
- ARM: remove arch-imx
- ARM: S3C64XX: Add S3C6400 SDHCI setup support
- ARM: S3C64XX: Initial support for PM (suspend to RAM)
- Atmark Armadillo 500 board support.
- davinci: add SRAM allocator
- davinci: DM355: add base SoC and board support
- davinci: DM644x: add support for SFFSDR board
- davinci: DM646x: add base SoC and board support
- davinci: INTC: add support for TI cp_intc
- i.MX31: Add ethernet support to i.MX31 Litekit board.
- imx: re-work of PWM, add i.MX21 support
- IXP4xx: support for Goramo MultiLink router platform.
- MAINTAINERS: add entry for Mitac Mio A701 board
- MAINTAINERS: add entry for Palm Treo680
- MX27: Add basic support to MX27PDK
- MX35: Add basic support for MX35PDK board
- mxc: Add i.MX27LITE board support
- MXC: mx21ads base support
- OMAP2/3: PM: push core PM code from linux-omap
- omap iommu: simple virtual address space management
...Power:
- 83xx: add support for the kmeter1 board.
- powerpc/4xx: Sequoia: Enable NAND support
- powerpc/85xx: Add dts files for X-ES MPC85xx boards
- powerpc/85xx: Add eSDHC support for MPC8569E-MDS boards
- powerpc/85xx: Add MPC8569MDS board support
- powerpc/85xx: Add P2020DS board support
- powerpc/85xx: Add PCI IDs for MPC8569 family processors
- powerpc/85xx: Add platform support for X-ES MPC85xx boards
- powerpc/85xx: Add STMicro M25P40 serial flash support for MPC8569E-MDS
- powerpc: Add support for swiotlb on 32-bit
- powerpc: Introduce CONFIG_PPC_BOOK3S
- powerpc: Make the NR_CPUS max 8192
- powerpc/virtex: Add support for Xilinx PCI host bridge
- powerpc/virtex: Add Xilinx ML510 reference design support
...x86:
- amd-iommu: implement suspend/resume
- docs, x86: add nox2apic back to kernel-parameters.txt
- x86: Add cpu features MOVBE and POPCNT
- x86: add extension fields for bootloader type and version
- x86: Add quirk for Intel DG45ID board to avoid low memory corruption
- x86: Add quirk to make Apple MacBook5,2 use reboot=pci
- x86: Add reboot quirk for every 5 series MacBook/Pro
- x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB
- x86/docs: add description for cache_disable sysfs interface
- x86: document new bzImage fields
- x86, ds: support Core i7
- x86: implement percpu_alloc kernel parameter
- x86, intr-remap: add option to disable interrupt remapping
- x86: make CONFIG_RELOCATABLE the default
- x86, mce: add basic error injection infrastructure
- x86, mce: Add boot options for corrected errors
- x86, mce: add machine check exception count in /proc/interrupts
- x86, mce: add MCE poll count to /proc/interrupts
- x86, mce: add table driven machine check grading
- x86, mce: deprecate old 32bit machine check code
- x86, mce: document new 32bit mcelog requirement in Documentation/Changes
- x86, mce: enable MCE_AMD for 32bit NEW_MCE
- x86, mce: enable MCE_INTEL for 32bit new MCE
- x86, mce: improve documentation
- x86, mce: use 64bit machine check code on 32bit
- x86, mce: therm_throt: Don't log redundant normality
- x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
- x86, mce: Don't initialize MCEs on unknown CPUs
- x86: MSR: add methods for writing of an MSR on several CPUs
- x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
- x86/PCI: add description for check_enable_amd_mmconf boot parameter
- x86: smarten /proc/interrupts output for new counters
- x86, setup: "glove box" BIOS calls -- infrastructure
- x86, setup: "glove box" BIOS interrupts in the APM code
- x86, setup: "glove box" BIOS interrupts in the core boot code
- x86, setup: "glove box" BIOS interrupts in the EDD code
- x86, setup: "glove box" BIOS interrupts in the MCA code
- x86, setup: "glove box" BIOS interrupts in the video code
...Various:
- Add Fenghua Yu as temporary co-maintainer for ia64
- amd64_edac: add DRAM address type conversion facilities
- asm-generic: add a generic unistd.h
- asm-generic: add generic ABI headers
- avr32: Add support for Mediama RMTx add-on board for ATNGW100
- Blackfin: add support for bzip2/lzma compressed kernel images
- Blackfin: initial support for ftrace
- Blackfin: initial support for ftrace grapher
- DMA: TXx9 Soc DMA Controller driver
- MIPS: Add Cavium OCTEON PCI support.
- MIPS: Add hibernation support
- MIPS: Add support for Texas Instruments AR7 System-on-a-Chip
- MIPS: Allow suspend and hibernation again on uniprocessor kernels.
- MIPS: CMP: activate CMP support
- S390: ftrace: add dynamic ftrace support
- S390: ftrace: add function graph tracer support
- S390: ftrace: add function trace mcount test support
- S390: ftrace: add system call tracer support
- S390: pm: memory hotplug power management callbacks
- S390: secure computing arch backend
- sh: Add ms7724se (SH7724) board support
- sh: Add support for SH7724 (SH-Mobile R2R) CPU subtype.
- sh: SH7786 SMP support.
- sparc64: Add proper dynamic ftrace support.
- sparc64: Use new dynamic per-cpu allocator.
MM
- memcg: add file-based RSS accounting
- memcg: add interface to reset limits
- memcg: fix behavior under memory.limit equals to memsw.limit
- mm: add swap cache interface for swap reference
- mm: introduce PageHuge() for testing huge/gigantic pages
- mm: Pass virtual address to __p{te,ud,md}_free_tlb()
- mm: remove CONFIG_UNEVICTABLE_LRU config option
- mm, x86: remove MEMORY_HOTPLUG_RESERVE related code
- oom: move oom_adj value from task_struct to mm_struct
- pagemap: add page-types tool
- pagemap: document 9 more exported page flags
- pagemap: document clarifications
- vmscan: make mapped executable pages the first class citizen
- vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim
- x86: add hooks for kmemcheck
Tracing
- blktrace: add trace/ to /sys/block/sda
- blktrace: support per-partition tracing
- blktrace: support per-partition tracing for ftrace plugin
- ftrace: add kernel command line function filtering
- function-graph: add option to calculate graph time or not
- function-graph: add stack frame test
- function-graph: disable when both x86_32 and optimize for size are configured
- gcov: fix documentation
- kernel: constructor support
- oprofile: add support for Core i7 and Atom
- oprofile: introduce module_param oprofile.cpu_type
- ring-buffer: add benchmark and tester
- ring-buffer: add total count in ring-buffer-benchmark
- tracing: add average time in function to function profiler
- tracing: Add documentation for the power tracer
- tracing: add function profiler
- tracing: adding function timings to function profiler
- tracing: add irq tracepoint documentation
- tracing: add new tracepoints docbook
- tracing: Document the event tracing system
- tracing/events: Documentation updates
- tracing/filters: support for filters of dynamic sized arrays
- tracing: make the function profiler per cpu
Virtualisation
- KVM: Add VT-x machine check support
- KVM: Enable MSI-X for KVM assigned device
- KVM: Fix cpuid feature misreporting
- KVM: SVM: Add NMI injection support
- lguest: add support for indirect ring entries
- lguest: Add support for kvm_hypercall4()
- xen: add "capabilities" file
For other articles on 2.6.31 and links to the rest of the "Coming in 2.6.31" series, see The H's Kernel Log - 2.6.31 Tracking page.
(trk)