In association with heise online

17 July 2009, 08:55

The Btrfs file system

by Dr. Oliver Diedrich

Btrfs, the designated "next generation file system" for Linux, offers a range of features that are not available in other Linux file systems – and it's nearly ready for production use.

Tux juggling file systems

If the numerous articles published about this topic in the past few months are to be believed, Btrfs is the file system of the future for Linux and the file system developers agree: Btrfs is to be the "next generation file system" for Linux. The general consensus (not so much among developers, but among the general supporters of Btrfs) is that Btrfs is the ZFS for Linux (for example, according to Linux Magazine). While this may be disputable at present since ZFS, designed by Sun Microsystems for the Solaris Operating System, is already in production use, while Btrfs is still highly experimental, the two file systems do have a lot in common. With its integrated volume management, checksums for data integrity,Copy on Write and snapshots, Btrfs offers a range of features unrivalled by any of the Linux file systems currently in production use.

Btrfs, which is called "ButterFS" by some people and "BetterFS" by others, is actually short for B-Tree File System, and is so named because the file system manages its data and metadata in tree structures. Masterminded by Oracle developer Chris Mason, the file system has been a part of the Linux kernel since Linux 2.6.29. However, this doesn't mean that it is stable, let alone suitable for production use. The Btrfs page at clearly points out that not even the file system's on disk data formats have so far been finalised.

In an interview with Amanda McPherson of the Linux Foundation, Mason provides an outlook on the next development steps of Btrfs: According to Mason, most of the performance bottlenecks have been resolved in the recent first release candidate of the 2.6.31 kernel (making the performance comparisons between Btrfs and other Linux file systems recently published, for example, by Phoronix, largely redundant). Mason says that With kernel version 2.6.32, the developers plan to have Btrfs ready for serious testing by the early adopters.

It is, therefore, time to take a look at this next generation file system for Linux.

Fit for the future

As a 64-bit file system, Btrfs addresses up to 16 exabytes (16,384 petabytes) – both in terms of the maximum volume size and the maximum file size. This is considerably more than is addressed by Ext4 (1024 PBytes / 16 TBytes), matches Sun's ZFS and offers plenty of reserves for years to come. As a reference point, the Large Hadron Collider (LHC) at CERN, currently probably the producer of the largest amount of data worldwide, has about 20 PBytes of storage available – in a grid distributed across eleven data centres in Europe, North America and Asia.

A number of other Btrfs concepts are similar to those of ZFS:

  • Btrfs stores metadata and data blocks in two tree structures – one for the directory and file names, and another for the data blocks.
  • Data blocks are addressed via extents instead of block lists, which increases performance especially with large files (for more details about extents refer to the article The Ext4 Linux file system). Btrfs can store small files directly in the leaves of the tree, avoiding the overhead that would be created with small file extents.
  • Btrfs uses Copy on Write: Modified data is written into new data blocks to preserve the old data. Only once the data has been written is the tree to the data updated – until then, the file entry points to the old data blocks. Copy on Write can be disabled using the nodatacow mount option.
  • Btrfs therefore operates in a kind of "data=ordered" mode, which ensures that the metadata is only modified once all the relevant data has been written to disk. This makes it impossible for old data previously stored in a file's allocated data blocks to appear in apparently modified files after a system crash or power failure.
  • Copy on Write allows writeable snapshots to be created.
  • RAID functionality is an integral feature (only RAID-0, RAID-1 and RAID-10 are currently available, however). Devices can be added and removed during operation. What's still missing is RAID-5 and RAID-6 with one or two redundant disks.
  • Several file systems can be nested within one volume (one partition) as subvolumes – this gives Btrfs a kind of Logical Volume Manager.
  • Checksums verify metadata and data blocks, which allows flawed data to be detected and – if a RAID-1 or RAID-10 has been configured – corrected using the mirror. This behaviour can be disabled via the nodatasum mount option.

In addition, Btrfs offers various features that are expected of modern file systems:

  • Btrfs creates inodes dynamically as required; no static inode tables are written when a file system is created.
  • The file system can be up-sized or down-sized as well as de-fragmented during operation.
  • Btrfs supports Access Control Lists (ACLs) as described in the POSIX standard.
  • Files can be compressed when they are written (mount option compress). The option of encrypting data on the fly is also planned.

Next: Getting Buttered Up

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit