Kernel Log: Coming in 2.6.32 (Part 6): Infrastructure
by Thorsten Leemhuis
Devtmpfs, aka 'devfs 2.0' to its detractors, should allow the Linux kernel to start faster and run without udev. Support has been added for ACPI 4.0 and there are two new make targets which generate kernel configurations attuned to the running system. Changes to the power management subsystem increase data throughput and allow better use of runtime power saving features on modern I/O devices.
On releasing Linux 2.6.32-rc8, Linus Torvalds hinted that kernel version 2.6.32 could be released by the end of this month. The Kernel Log therefore brings its reporting on the changes in kernel 2.6.32 to an end, dedicating the sixth article in the 'What's coming in 2.6.32' series to changes in and around the kernel infrastructure. The first four articles in the series dealt with changes in the networking subsystem, in the graphics hardware, in storage hardware and file systems and in other drivers, while the fifth article looked at architecture code, memory management, virtualisation and tracing .
ACPI, PCI and PM
Various patches add support for ACPIÂ 4.0 to the Linux kernel. They are accompanied by a driver for ACPIÂ 4.0-compliant power meters and an ACPI Processor Aggregator Device Driver. The latter makes individual CPUs idle when instructed to do so via ACPIÂ 4.0, in order to temporarily reduce power consumption in the event, for example, of an electrical emergency or impending overheating. This allows a system to continuing running, albeit with reduced performance, rather than switching itself off. Its merger followed a long discussion in which Torvalds spoke out in favour of the driver â details can be found in an article on LWN.net.
VGA arbitration patches have now been integrated into the PCI subsystem. On systems with multiple graphics cards running X server version 1.7 or later, this code ensures that each graphics card receives the correct X server VGA command, offering a big increase in flexibility when setting up multi-seat environments. Details can be found in the kernel documentation and in two old blog entries by Dave Airlie and Tiago Vignatti. PCI subsystem maintainer Jesse Barnes lists several further major changes within the PCI subsystem in his main git pull request. These include enhancements for PCI ASPM (Active State Power Management) and improved reset options for PCIe devices â the changes implementing these enhancements are listed at the end of this article.
Several major changes to the power management code create the basis for making better use of the runtime power saving features on modern I/O devices â details can be found in this LWN.net article. An enhancement to the Cpuidle framework, also explained on LWN.net, should increase I/O throughput on larger servers in particular â in some cases significantly. Further changes to the power management code are listed in Rafael J. Wysocki's main git pull request.
Scheduling and security
There have been numerous enhancements to the kernel scheduler, which is responsible for allocating processor time to different applications (1, 2). After a fork, the kernel does not, now, execute the child process first. There is also the new SCHED_RESET_ON_FORK flag â if set, the kernel will reset the priority of child processes to the default â background information on these two changes can be found in two LWN.net articles (1, 2).
Changes to the process scheduler also eliminate two weaknesses found in speed comparisons made using Con Kolivas' recently published and independently developed BFS ("Brain Fuck Scheduler"). One of these problems resulted in x264 encoding being significantly slowed on multi-core systems. One of the x264 developers explains the background to this problem on a blog and also takes the opportunity to make some recommendations on improving collaboration between developers.
Sysfs now supports security labels, allowing security frameworks such as SELinux to monitor access to the virtual file system. Crypto subsystem maintainer Herbert Xu has summarised some of the more significant cryptography-related changes in his main git pull request for 2.6.32.
A second shot at devfs
Following lengthy discussions over the summer, devtmpfs, aka 'devfs 2.0' to its detractors, has now made it into the kernel. It allows the kernel, on booting, to itself create and mount a RAMdisk populated with a device file system. This can speed up boot time and makes it possible to boot without using an initrd populated with udev or similar.
These, however, are just some of the benefits of devtmpfs, which has come under heavy fire from some quarters. Torvalds, however, liked the concept, in particular the fact that the kernel is now able to carry out the whole boot process autonomously. How it works is explained in an old LWN.net article. Directly after it was merged, it was realised that udev would still be required after booting, in order (among other things) to set privileges for /dev/null and /dev/zero. A hastily added patch now allows the kernel to take care of this too.
Suitably configured
Kernel testers can now use the new 'localmodconfig' make target to, relatively simply, create a kernel configuration attuned to the distribution and hardware being used, and which does not compile any unneeded modules. This uses the configuration file for the currently running kernel and deactivates all modules which are not loaded in the kernel when it is run.
As a consequence, drivers for hardware which is not connected when the make is called, such as USB devices, may be omitted. For testers, however, it has the potential to save a bunch of time when generating a kernel, as explained in a git pull request by Steven Rostedt, the kernel hacker responsible for this change. Rostedt also explains how it works and describes the 'localyesconfig' make target, which generates a kernel configuration into which all modules loaded when it is called are compiled.
Minor Gems
Many further minor, but by no means insignificant, changes can be found in the list below. Like many of the references in the text above, the links point to the relevant commits in the web front end of the Git branch at kernel.org that Linus Torvalds uses for maintaining the kernel sources. There, the commit comments and the patches themselves provide extensive further information on the respective changes.
ACPI
- ACPI: add AC/DC notifier
- ACPI button: provide lid status functions
- ACPICA: ACPI 4.0 : Add new return package type, restructure module.
- ACPICA: ACPI 4.0: iASL/Disassembler - IPMI keyword support.
- ACPICA: ACPI 4.0: Interpreter support for IPMI.
- ACPICA: ACPI 4: Add validation for new predefined names.
- ACPICA: Add 64-bit support to acpi_read and acpi_write
- ACPICA: Add support for module-level executable AML code
- ACPICA: Major update for acpi_get_object_info external interface
- ACPICA: Update _OSI with new Windows OS strings
- ACPICA: Windows compatibility: autoexecute root _INI method
- ACPI: Make ACPI processor proc I/F depend on the ACPI_PROCFS
- ACPI, PCI: Change PREFIX to "PCI" from "ACPI" in mmconfig-shared.c
- led: document sysfs interface
- Revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible"
- SFI: add capability to parse ACPI tables
- SFI: add platform-independent core support
- SFI: create linux/sfi.h
- thermal: sysfs-api.txt - document passive attribute for thermal zones
- video/backlight: document sysfs interface
- video/lcd: document sysfs interface
Crypto
- crypto: ahash - Convert to new style algorithms
- crypto: api - Add new style spawn support
- crypto: cryptd - Add support to access underlaying shash
- crypto: gcm - Use GHASH digest algorithm
- crypto: mv_cesa - Add support for Orion5X crypto engine
- crypto: padlock - Switch sha to shash
- crypto: sha256_generic - Add export/import support
- crypto: shash - Add spawn support
- crypto: talitos - add support for 36 bit addressing
- crypto: vmac - New hash algorithm for intel_txt support
LVM, SELinux, ..,
- Add audit messages on type boundary violations
- CRED: Add some configurable debugging [try #6]
- KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
- KEYS: Add garbage collection for dead, revoked and expired keys. [try #6]
- lsm: Add hooks to the TUN driver
- LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.
- security: introducing security_request_module
- selinux: Support for the new TUN LSM hooks
PCI
- agp: Add generic support for graphics dma remapping
- genirq: Support nested threaded irq handling
- intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
- intel-iommu: Kill DMAR_BROKEN_GFX_WA option.
- intel-iommu: Support PCIe hot-plug
- MAINTAINTERS: remove hotplug driver entries
- PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
- PCI ASPM: support L1 only
- PCI ASPM: support partial aspm enablement
- PCI ASPM: support per direction l0s management
- PCI: Document pci_ids.h addition policy.
- PCI hotplug: add pci_configure_slot()
- PCI hotplug: add support for 5.0G link speed
- PCI: support for PCI Express fundamental reset
- spi: add SPI driver for most known i.MX SoCs
- spi: add spi_ppc4xx driver
- spi: add support for device table matching
- spi: Freescale STMP driver
- spi: McSPI off-mode support
- spi: remove i.MX SPI driver
- x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
PM
- CPUFREQ: Introduce global, not per core: /sys/devices/system/cpu/cpufreq
- CPUFREQ: ondemand - Use global sysfs dir for tuning settings
- CPUFREQ: Powernow-k8: Enable more than 2 low P-states
- CPUFREQ: update Doc for cpuinfo_cur_freq and scaling_cur_freq
- cpuidle: menu governor: reduce latency on exit
- NOHZ: update idle state also when NOHZ is inactive
- PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
- PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
- PM/Hibernate: Rework shrinking of memory
- tracing, x86, cpuidle: Move the end point of a C state in the power tracer
Process Scheduler
- sched: Add come comments to the sched features
- sched: Add new wakeup preemption mode: WAKEUP_RUNNING
- sched: Add SCHED_RESET_ON_FORK functionality for nice < 0 tasks
- sched: Add wait, sleep and iowait accounting tracepoints
- sched: Clean up SCHED_RESET_ON_FORK
- sched: Disable NEW_FAIR_SLEEPERS for now
- sched: Ensure that a child can't gain time over it's parent after fork()
- sched: Implement a gentler fair-sleepers feature
- sched: Introduce SCHED_RESET_ON_FORK scheduling policy flag
- sched: Merge select_task_rq_fair() and sched_balance_self()
- sched: Turn off child_runs_first
Various other changes
- cgroups: add a read-only "procs" file similar to "tasks" that shows only unique tgids
- cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time
- cgroups: support named cgroups hierarchies
- checkpatch: make -f alias --file, add --help, more verbose help message
- connector: Provide the sender's credentials to the callback
- docs: update patch size in SubmittingPatches
- Documentation: ABI: document /sys/devices/system/cpu/
- Documentation: ABI: rename sysfs-devices-cache_disable properly
- Documentation: ABI: /sys/devices/system/cpu/cpuidle/
- Documentation: ABI: /sys/devices/system/cpu/cpu#/node
- Documentation: ABI: /sys/devices/system/cpu/cpu#/ topology files
- Documentation: ABI: /sys/devices/system/cpu/ topology files
- Driver core: Add support for compatibility classes
- exec: let do_coredump() limit the number of concurrent dumps to pipes
- futex: Add memory barrier commentary to futex_wait_queue_me()
- futex: Make function kernel-doc commentary consistent
- genirq: Add buslock support
- genirq: Add oneshot support
- kbuild: add static to prototypes
- kbuild: introduce ld-option
- kbuild: rename ld-option to cc-ldoption
- kbuild: set -fconserve-stack option for gcc 4.5
- kbuild: use INSTALLKERNEL to select customized installkernel script
- kconfig: add streamline_config.pl to scripts
- kconfig: add symbol value to help find the real depend
- kconfig: make local .config default for streamline_config
- kconfig: make localmodconfig to run streamline_config.pl
- kconfig: test for /boot/config-uname after /proc/config.gz in localconfig
- kernel-doc: allow multi-line declaration purpose descriptions
- kernel hacking: move STRIP_ASM_SYMS from General
- kmemleak: add clear command support
- kmemleak: Dump object information on request
- memcg: add comments explaining memory barriers
- memcg: improve resource counter scalability
- memcg: remove the overhead associated with the root cgroup
- memcg: show swap usage in stat file
- memory controller: soft limit documentation
- memory controller: soft limit interface
- memory controller: soft limit organize cgroups
- memory controller: soft limit reclaim on contention
- param: allow whitespace as kernel parameter separator
- percpu: build first chunk allocators selectively
- percpu: implement optional weak percpu definitions
- percpu: introduce pcpu_alloc_info and pcpu_group_info
- percpu: use dynamic percpu allocator as the default percpu allocator
- printk: add printk_delay to make messages readable for some scenarios
- rcu: Add synchronize_sched_expedited() rcutorture doc + updates
- rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
- rcu: Merge preemptable-RCU functionality into hierarchical RCU
- rcu: Remove Classic RCU
- rcu: Remove CONFIG_PREEMPT_RCU
- rcu: Renamings to increase RCU clarity
- scripts/get_maintainer.pl: add --git-blame
- scripts/get_maintainer.pl: add .mailmap use, shell and email cleanups
- scripts/get_maintainer.pl: add maintainers in order listed in matched section
- scripts/get_maintainer.pl: add patch/file search for keywords
- scripts/get_maintainer.pl: add --pattern-depth
- scripts/get_maintainer.pl: add --remove-duplicates
- scripts/get_maintainer.pl: add sections in pattern match depth order
- time: add function to convert between calendar time and broken-down time for universal use
- time: Introduce CLOCK_REALTIME_COARSE
- timekeeping: Add timekeeper read_clock helper functions
- timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeper
- timekeeping: Increase granularity of read_persistent_clock()
- timekeeping: Introduce struct timekeeper
- Update flex_arrays.txt
Latecomers
The previous five parts of the Kernel Log mini series have already given a detailed overview of the many changes in of the Linux kernel. All of the major enhancements have already been mentioned there, as they entered the main development branch during the first phase of the development cycle. But some small, usually not quite as important changes that fall in the "minor gems" section entered the kernel later â for completeness, we have included these changes in the following list.
Graphics
- drm: Add the basic check for the detailed timing in EDID
- drm/radeon/kms: add debugfs for power management for AtomBIOS devices
- drm/radeon/kms: add quirk for acer 5102
- drm/radeon/kms: add quirk for hp dc5750
- drm/radeon/kms: add support for msi
- drm/radeon/kms/atom: add support for AdjustDisplayPll
- drm/radeon/kms/atom: add support for spread spectrum (v2)
- drm/radeon/kms: fix support for original r100
- drm/radeon/kms: initial mode validation support
Network
- ath5k: add LED definition for BenQ Joybook R55v
- ath5k: add LED support for HP Compaq CQ60
- e1000e: config PHY via software after resets
- e100: e100_phy_init() isolates selected PHY, causes 10 second boot delay
- iwlwifi: Use RTS/CTS as the preferred protection mechanism for 6000 series
- mlx4_core: Add a new supported 40 GigE device ID
- netdev: usb: dm9601.c can drive a device not supported yet, add support for it
For other articles on 2.6.32 and links to the rest of the "Coming in 2.6.32 " series, see The H's Kernel Log - 2.6.32 Tracking page. (thl /c't).
(crve)