Kernel Log: Coming in 2.6.35 (Part 2) - File systems and storage
by Thorsten Leemhuis
Direct I/O and improved out-of-space handling for Btrfs, optimisations for XFS and OCFS2, major restructuring measures for the Libata driver and extended RAID migration options are among the most important changes in Linux 2.6.35.
In the early hours of Tuesday morning, Linus Torvalds released the fifth release candidate of Linux 2.6.35. As in RC3 and RC4, the developers made fewer and generally more conservative changes than they did at the same stage in previous kernels. However, one commit stands out because it's 5 Megabytes and removes 200,000 lines of text from the standard ARM system configuration files. This is the first consequence of a discussion in which Linus Torvalds complained about the bulky and only moderately helpful files. The configuration files for Power systems are also likely to be streamlined soon; this may be followed by further changes to dynamically create standard configuration files with less overhead. Background information about this topic can be found in a recent article on LWN.net.
The Kernel Log takes the release of RC5 as an opportunity to continue its "Coming in 2.6.35" mini series and describe the improvements in file systems and storage infrastructure as well as storage drivers. Part 1 of our mini series described the changes in the graphics hardware area. Further articles in the coming weeks will discuss the kernel's network support, USB drivers, FireWire, V4L/DVB etc as well as architecture code and infrastructure.
From 2.6.35, the Btrfs file system will support Direct I/O – which bypasses the kernel cache when accessing files and is, for instance, relevant for database software with integrated cache functions. In 2.6.32, the developers already made numerous changes to the Btrfs code to prevent problems which occurred when systems approached their maximum storage capacity; however, the problems persisted in certain relatively rare conditions. The current patches (such as 1, 2, 3, 4) are intended to finally solve these problems. As before, the kernel will activate write barriers, enhancing data security while weakening performance by default in Ext4 but will disable them in Ext3 – however, some modifications to the module parameters and documentation are designed to reduce the confusion created by this historically caused inconsistency.
SquashFS can now handle extended attributes, which allows it to be used with SELinux. The kernel's Fuse code can now use Splice to exchange data with the userspace, which eliminates one data copy transaction in the working memory and enhances the performance of the "File System in Userspace" (1, 2, 3). XFS now offers the initially experimental delayed logging feature, which can be enabled via a mount option and, according to its documentation and the XFS status update for May 2010, considerably reduces the bandwidth required for I/O logging.
Information about further file system changes can be found via the links in the "minor gems" section at the end of this article and in the Git-Pull requests for the VFS (1, 2) as well as for Btrfs, Ceph and Nilfs2.
In his main Git-Pull requests (1, 2), Jeff Garzik points out that the developers have considerably restructured various aspects of the Libata subsystem. For instance, most of the AHCI driver code has been moved to the new libahci, which is the basis of the generic AHCI driver and of system-specific drivers such as the new ahci_platform driver suitable for system-on-a-chip (SOC) components. Various further restructuring measures (see also the "Minor gems" section at the end of this article) were introduced to separate the Bus-Mastering DMA code from the SFF code and improve the structure of the driver code. Asynchronous Notification (AN), which reports media changes in ATAPI devices, is now disabled by default and can be enabled via the new "atapi_an" driver parameter.
The Infiniband subsystem now offers the RDMA/iw_cxgb4 iWARP driver for Chelsio's T4 series of network adaptors. Another new addition is the ib_qib driver for PCIe InfiniBand adaptors by QLogic; this driver also takes over the support of Qlogic's QLE adaptors from the ib_ipath driver. Further background information about current and future kernel development plans in the Infiniband area can be found in Roland Dreier's main Git-Pull request.
Located in the SCSI subsystem, Qlogic's qla2xxx driver can now also communicate with series ISP82XX FcoE adaptors. The SCSI code now contains numerous trace points to simplify troubleshooting and performance optimisation measures. The maintainer of the SCSI subsystem lists further changes in the email he sent to submit his collection of changes (1, 2).
The kernel now includes an MTD driver for the Denali NAND controller used in Intel's Moorestown platform. The "Blkio controller cgroup interface" introduced in 2.6.33 has been considerably extended to include new options for measuring data throughputs and sort them by applications or groups (1, 2, 3, 4, 5, 6, 7).
The developers have also updated and extended the discard support, which is relevant for SSDs. Users no longer need to enable their laptop mode globally and can now activate it individually for each storage device.
The kernel's RAID10 support is no longer classified as experimental. The MD subsystem's code now offers numerous further conversion options: Raid0->Raid5, Raid0->Raid10, Raid5->Raid4, Raid4->Raid0 as well as Raid5->Raid0 and Raid10->Raid0.