The Ext4 Linux file system
by Dr. Oliver Diedrich
Ext3, the default Linux file system for many years, is definitely starting to show its age. Modern mass storage devices are approaching its limits and block-based data management is no longer adequate for modern file sizes. High time for an update!
Appeared in c't 10/09, p. 180
In the face of rapidly rising data volumes, it is increasingly clear that Ext3, the current default Linux file system, is reaching its limits. A maximum file system, and thus volume size, of 16 TB can already be a tight squeeze for large RAID arrays; Ext3's 32-bit block numbers and 4 KB data blocks mean, however, that there's no way around this limit. A major refurbishment is therefore due.
Development of Ext4 started in 2006 with two changes to the Ext3 file system: block number size was increased to 48 bits and indirect block addressing – in which the data blocks making up a file are stored in a long list made up of individual block numbers – was replaced by extents, consisting of ranges of data blocks. Because this involved changing the structure of the data stored on the disk, the programmers decided that rather than introducing these patches into Ext3, it was time to create a new version of the file system – Ext4 – based on the Ext3 code.
The result of three years of Ext4 development has been significant advances from Ext3 which increase the volume limit to 1024 PB. This should be sufficient for many years to come. Extents, long implemented in other file systems such as XFS, should improve the efficiency of managing large files. There are also a whole range of under-the-bonnet changes intended to improve Ext4 performance compared to Ext3.
The kernel development team adopted the Ext4 code in version 2.6.19 to give it the opportunity to come to maturity in the kernel. Ext4 was marked as experimental in versions up to and including 2.6.27, but since Linux 2.6.28 the new file system is now considered stable. Not that this rules out the odd bug or other unpleasant surprise. The latest Ubuntu 9.04 can already be installed on Ext4 and the forthcoming Fedora 11 release will use Ext4 as its default file system.
Ext4 works with 48-bit block numbers, whilst the default block size remains 4 KB. This allows file system sizes composed of up to 248 4 KB blocks – equivalent to an exabyte (1024 PB) – compared to the 16 TB maximum in Ext3. Why not go straight to 64-bit block numbers? An article by the development team offers a very pragmatic reason: 1 EB is going to be more than enough storage for a very long time – indeed a complete e2fsck run on a file system of this size would (on current hardware) take more than 100 years. Before we even begin to approach this limit, a whole other set of problems, which will necessitate much more substantial file system changes than 64-bit block numbers, will need to be addressed. Plus there's also the fact that 48-bit block numbers fit better into the old Ext3 data structure.
According to Ext4 head developer Ted T'so, extending block numbers to 64 bits shouldn't be too big a deal, and may even be tackled during ongoing Ext4 development. Indeed some structures, such as super-blocks, block group descriptors and the new JBD2 journaling layer – developed in tandem with Ext4 – are already set up for 64-bit block numbers.
The i_blocks value in the inode, which records the number of blocks occupied by a file and in Ext3 is 32 bits long, has been adapted to the larger block numbers in two ways. Firstly, it no longer counts in terms of 512 byte hard drive sectors (as was the case in Ext3), but instead counts in terms of the file system's block size – generally 4 KB. A flag in the inode indicates how this value should be interpreted, something which is very important when upgrading from Ext3 to Ext4, where old Ext3 inodes which count by sector may still be present.
Secondly, two previously unused bytes in the inode are now used to store the high 16 bits of the 48-bit block number. The file system feature
huge_file indicates that the file system is working with 48-bit block numbers and that inodes can count in file system blocks. Despite 48-bit block numbers, individual files cannot at present be larger than 16 TB, as the current extents structure does not allow management of larger files (on which more below).
Of the other file systems in the kernel, the main competitor to Ext4 is XFS – IBM's JFS has to date failed to find many fans within the Linux community and Reiser4 has still not been integrated into the kernel. In expounding the advantages over XFS, Ext4 developers cite the leaner code base (around 30,000 lines totalling 900 KB, compared to 100,000 lines totalling 3.2 MB for XFS), the ability to convert an Ext3 file system to Ext4 and the large proportion of code imported from the mature and extremely well-tested Ext3.
|Linux file systems|
|File system||Maximum file size||Maximum file system size|
|Ext4||16 Tbyte||1024 Pbyte|
|Ext3||2 Tbyte||16 Tbyte|
|JFS||4 Pbyte||32 Pbyte|
|ReiserFS 3||8 Tbyte||16 Tbyte|
|XFS||8192 Pbyte||8192 Pbyte|
|ZFS (Solaris)||1384 Pbyte||16384 Pbyte|