In association with heise online

12 March 2009, 15:13

Kernel Log: What's new in 2.6.29 - Part 5: Filesystems Btrfs, SquashFS, Ext4 without journaling

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

If we are to believe the statements made by Linus Torvalds when presenting the seventh release candidate of 2.6.29, and assuming release candidates come weekly, it will be at least another week or two before Linux kernel 2.6.29 becomes available. The Kernel Log will, therefore, continue its report about the new features scheduled for 2.6.29 with what's new in terms of file systems.

Butter file system

As already mentioned during the merge window of 2.6.29, the kernel developers have integrated the experimental Btrfs file system. This doesn't imply that Btrfs is now complete – it rather means that the kernel hackers intend to further develop and mature it within the Linux kernel framework as they did Ext4. Ext4 was integrated into Linux 2.6.19 in autumn 2006 and recently completed its main development phase in Linux 2.6.28.

Btrfs – short for B-tree FS, but generally referred to as butter FS – is a "copy-on-write" file system originally created by kernel developer Chris Mason, who works at Oracle and had previously spent some time handling ReiserFS for Suse. After the first announcement of Btrfs on the Linux Kernel Mailing List (LKML) in July 2007, Mason quickly found support from other developers – the comments to the commit were authored by developers employed, for example, by HP, Intel, Novell/Suse and Red Hat.

As Theodore Ts'o (also called Ted Tso or Tytso) the developer of Ext4 (and Ext2&3) revealed in an email, several months ago at a meeting in autumn 2007, a number of key Linux file system developers already agreed that Btrfs should become the "next generation file system for Linux," . Ts'o says that he and other developers will still continue to develop Ext4, as its tried and tested Ext3 basis and its more advanced development, make it suitable as a bridge, until Btrfs has matured enough to earn the trust of enterprise users.

The Btrfs wiki offers an overview of the most important features of this file system, which was specifically designed for Linux from scratch:

  • Extent-based file storage (maximum file size 2^64 bytes)
  • Space efficient packing of small files and indexed directories to minimise storage requirements
  • Dynamic inode allocation
  • Writable snapshots
  • Subvolumes
  • Checksums on data and metadata
  • Compression
  • Integrated multiple device support for combining several devices into one volume with several RAID algorithms
  • On line file system check and defragmentation
  • Very fast off line file system check
  • Efficient incremental backup and file system mirroring
  • Optional SSD optimisation

The development time line explains a few things that are still on the developers' to-do list, while the changelog offers a good overview of their achievements so far. The file system's structure and operation are explained in another wiki document, and frequently asked questions are answered in the FAQ. While the "on disk" layout of the file system underwent several modifications during the development of Ext4, making it necessary to reformat when switching to a new kernel, the developers of Btrfs plan to spare users this hassle from now on – although further changes to the on disk format of Btrfs can't be ruled out completely, of course.

Btrfs was integrated with its entire development history, which adds up to a total of more than 900 minor and major commits in the Linux source code management system. After its integration in early January this year, the kernel hackers extended the file system to include new features like the support of SELinux. Further changes for 2.6.30 are already being prepared – some of them designed to further improve performance.

While some of the Linux distributions are now considering Ext4 as their standard file system, it will probably be quite some time before the same is true for Btrfs. The Fedora developers, however, have already extended their development branch installer and kernel to include Btrfs support.

Squash file system

While the integration of the experimental Btrfs is initially unlikely to affect the majority of users and distribution developers, the addition of SquashFS should have a more immediate impact. SquashFS is a compressed read-only file system that various Linux distributions already use on their installation and live media (USB, CD or DVD) to minimise storage requirements. For the same reason, SqaushFS is often used in the embedded area as an alternative to Cramfs. The kernel documentation for Squashfs offers detailed explanations of the differences between Cramfs and Squashfs and discusses the new file system's operation.

Several times in the past few years, the developers of SquashFS have tried to get their file system integrated into the Linux kernel, but they didn't manage to comply with the kernel hacker's high quality standards. Although they have worked to improve the criticised code segments in the now integrated version 4.0, the integration of SquashFS still came as quite a surprise – following a long discussion about the pros and cons, as well as several problems in the current code, Linus Torvalds said that it wouldn't make sense not to integrate it if everyone uses it anyway ("if this is really in use by everybody, then not merging it is kind of pointless. "); shortly afterwards, he merged the SquashFS patches into the main development branch.

Even more about file systems

There are also a considerable number of changes to the kernel's long standing file systems. The kernel is now capable of a temporary file system freeze (1, 2, 3, LWN article) – this is, for example, relevant for container virtualisation and for backup solutions. eCryptfs can now encrypt file names (for example 1, 2, 3, 4); the developers incorporated numerous major changes to the Btree algorithm in XFS. They also made major changes to the OCFS2 cluster file system, which now supports ACLs, security attributes, quotas and metadata checksums.

The developers also improved, tidied up and corrected Ext4 in many minor ways – some of the changes have even been integrated into the 2.6.27 and 2.6.28 stable kernel series. In addition, the Ext4 developers adapted the documentation for activating write barriers, which had recently sparked prolonged discussions (see also the related LWN article). Several changes were made to the Fsync algorithm to marginally improve performance.

Thanks to several changes introduced by Google developers, Ext4 file systems can now be run without journaling to further improve their speed – until now, some users still stuck with Ext2 to avoid the journaling overhead. A recent blog entry by Theodore Ts'o contains a few test results detailing the impact journaling has on performance and some thoughts about using Ext file systems on SSDs. Those who are interested in Ext4 can find a lot of additional background information in a recently published article on IBM's developerworks.

The support of online defragmentation ("online defrag") in Ext4 has not yet been integrated in 2.6.29; at the end of January, a new version was released which also incorporates changes to some of the previously criticised code segments (see also the related LWN article).

Many other changes

As well as the changes we've already discussed, 2.6.29 also supports many other important new file system features for the Linux kernel:

Btrfs

  • There are too many commits to list, but these Git pull requests offer an overview (1, 2, 3, 4, 5, 6) as well as the Git web interface at kernel.org.

CIFS

Ext[234]

Fuse

OCFS2

SquashFS

UBIFS

XFS

VFS, other file systems

Further background and information about developments in the Linux kernel and its environment can also be found in previous issues of the kernel log at The H Open Source:

Older Kernel logs can be found in the archives or by using the search function at The H Open Source. (thl/c't)

(djwm)

 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit