Kernel Log: Coming in 3.0 (Part 2) - Filesystems
by Thorsten Leemhuis
The kernel hackers have optimised the Btrfs code and extended it to include new functions; a substantially improved tool for testing the experimental filesystem is soon to be released. Ext4 now supports the "punch hole" technology for deallocating unused memory areas within a file.
Early this week, Linus Torvalds released a new pre-release (RC) for Linux 3.0. In the release mail for the fourth RC, he mentions some more extensive changes to the DRM subsystem. Among those changes are patches that improve support for the graphics core of AMD's recently introduced Llano, which was introduced between RC2 and RC3.
The Kernel Log is taking the development progress of Linux 3.0 as an opportunity to continue its "Coming in 3.0" mini series with a description of the advancements in terms of filesystems. Part 1 of the mini-series discussed the changes to the network drivers and infrastructure; over the coming weeks, further articles will discuss the kernel's graphics drivers, architecture code, storage support, infrastructure and other hardware drivers.
A large number of changes have been made to the still experimental Btrfs filesystem; in his Git-Pull request, main developer and Oracle employee Chris Mason said that the changes are probably the biggest he has ever sent. Between RC1 and RC2, the developer submitted several even more comprehensive changes – however, Torvalds sharply criticised this and demanded that, in future, the Btrfs developers refrain from submitting such major changes after the merge window has been closed.
These and previous Git-Pull requests also demonstrate how many different companies are now contributing to Btrfs. For instance, Fujitsu developer Miao Xie submitted new "Delayed Inode Items Operation" patches; in his commit comment, the developer says that the technology improves performance by about 15 per cent when creating files, and by about 20 per cent when deleting them.
Arne Jansen has submitted code to provide scrub support – one of the new source code files created in this context lists Strato as the copyright owner. The code allows users to instruct the kernel to read all data from the storage medium and run through all the checksums to ensure data integrity; if it finds a flaw, the filesystem will attempt to substitute an intact copy of the block if such a copy is available.
The developers have revised the code for managing inodes to prevent inode numbers from running out after a large number of files have been created and deleted on 32-bit systems; in this context, the filesystem has been given a cache for free inode numbers which currently needs to be enabled via a mount option, as it is still causing problems. The driving force behind these improvements is Fujitsu developer Li Zefan.
As Btrfs uses Copy on Write (COW), it can quickly fragment with certain access patterns; the new "auto_defrag" mount option instructs the filesystem to detect "small random writes" in existing files and queue the files for automatic defragmentation. The commit comment and Git-Pull request say that the option is suitable for small files such as Sqlite or Berkeley DB databases, but that it can't yet handle larger databases or the hard disk images of virtualisation solutions.
The change was introduced by Mason himself. In one of the Git-Pull requests, the developer points out various performance optimisations reportedly introduced by Red Hat developer Josef Bacik; Mason also said that SUSE developer Dave Sterba has cleaned up the code in numerous places.
In a discussion about using Btrfs as the default filesystem for Fedora 16, Bacik recently mentioned that an improved tool for controlling and troubleshooting Btrfs filesystems is "almost ready"; although it has been in preparation for months, the tool's release has reportedly been delayed because of very thorough testing. The currently available btrfsck/fsck.btrfs only offers basic functionality.