Kernel Log: Coming in 3.2 (Part 2) - Filesystems
by Thorsten Leemhuis
A number of changes to Btrfs mean that filesystem structures will not be as prone to corruption in case of system crashes. By bundling blocks into clusters, Ext4 should be a lot faster in some scenarios. CIFS also promises speed gains.
Last week, just in time for the US Thanksgiving holiday, Linus Torvalds released the third pre-release version of Linux 3.2. The commits since RC2 have, according to Torvalds, been mostly small and reasonable, though there is "certainly more churn" than he would have liked.
Torvalds is likely to release the fourth pre-release version at the end of this week or the start of next, with the final release expected in early to mid-January. In light of the ongoing development of Linux 3.2, the Kernel Log is continuing its "Coming in 3.2" series by describing new features in the area of filesystems. The first article in the series looked at changes to network drivers and network infrastructure; articles on architecture code, infrastructure and drivers for other hardware will follow over the next few weeks.
The Ext4 filesystem now supports big allocation blocks (e.g 1, 2, 3). The technique, known as bigalloc, bundles the 4K blocks use to store data into clusters of up to 1 MB. This reduces administrative overhead when saving large files and should significantly improve performance in some scenarios. It is, however, more wasteful in its use of storage space, since each file occupies the whole of at least one cluster. Bigalloc filesystems can be used with the recently released version 1.42 of E2fprogs; they are created by using the usual mkfs.ext4 command and defining the required cluster size using the new '-C' argument.
For 3.2, Ext filesystem developer Theodore 'tytso' Ts'o has removed an old algorithm for allocating memory blocks (Ext3, Ext4). He has also made a number of changes which should cause Ext4 to react better if users specify mutually incompatible mount options. In a recent presentation at LinuxCon Europe 2011, Ts'o noted that Ext4 is stable with a standard configuration (e.g. standard mount options with or without journal), but that problems were possible or even likely with unusual mount options and mount option combinations (e.g. saving the journal on another drive) as these had barely been tested. Extensive integrity testing with all possible combinations was, he noted, practically impossible. He therefore wants to work towards reducing the range of options.
The slides for his presentation have yet to be posted on the conference web site. The web site does, however, have some PDFs containing background information on the recent developments in Ext4 and Btrfs discussed in the presentations "The Ongoing Evolution of Ext4: New Features and Performance Enhancements" by Lukas Czerner and "Quo vadis Linux File Systems: An Operations Point of View on Ext4 and Btrfs" by Udo Seidel.
The developers have added readahead functions to the still experimental Btrfs filesystem. Scrub support, introduced in Linux 3.0, has been tweaked and should be a little faster – when it finds damaged blocks, the scrub function returns the name of the file that was allocated that space. Btrfs will now automatically adjust the allocation tables without an explicit scrub run in situations where it encounters a read error, but can fulfil the read request by using a second copy of the data.
If the root node, a key component of Btrfs, is damaged, the new "-o recovery" mount option can be used to instruct the filesystem to use an alternative root node. The kernel will then try an older root node and mount an older filesystem state, allowing the user to rescue whatever data can be accessed using this method.
Changes to internal log functions intended to improve Btrfs performance have unexpectedly been omitted from 3.2 in response to last minute problems. Chris Mason explains the problem in his git pull request, in which he also talks about some of the other enhancements merged into 3.2. A second request brought a number of further changes into RC3; there Mason specifically mentions fixes (for example: 1) for problems that could lead to corruption in case of crashes or power failures under certain conditions.