Possible data loss in Ext4
A bug report posted in the bug tracker for the next version of Ubuntu 9.04 (Jaunty Jackalope) describes a massive data loss problem when using Ext4, the future standard file system for Linux, available as an option when installing Ubuntu 9.04. The report describes a crash occurring shortly after the KDE 4 desktop files had been loaded, resulting in the loss of all of the data that had been created, including many KDE configuration files.
In a reply, Ext4 Developer Ted Ts'o explains the background of the problem. Like other modern file systems, such as XFS, Ext4 implements delayed allocation – writing new data can take up to 60 seconds. This increases the performance and allows for optimisation of the data organisation on the hard drive platter.
The KDE and GNOME desktop applications often read and write a large number of small files (for example, the configuration files for your personal settings). If the system crashes there may not be enough time for the data to be allocated and written to the hard drive – under ext4, the files may be truncated. This is because of delayed allocation. When a new file is created, the change is noted in the journal, but the data isn't written to the disk for a new file for anything between 45 and 150 seconds. The file system then catches up, allocating space for the file and writing the data. The exact technical details (which are critical system calls ftruncate() and rename()) can be found in the Ext4 developer's answer to the bug report.
Ts'o describes a workaround that tries to accurately identify this case and avoid the delayed allocation, but points out that other modern file systems, such as XFS and the new Btrfs, are also affected by this problem. The patches will not be included in the coming release of 2.6.29, but are queued for the 2.6.30 kernel.
Ts'o says that the application should be fixed so it does not write and rewrite small files. He advises that "this is really more of an application design problem more than anything else." Programmers had become accustomed to and dependent on, the behaviour of Ext3, which has a commit interval of 5 seconds and a default journalling mode of "data=ordered." Ext3's default journalling mode means that metadata is written to the journal in ordered mode, so any associated data changes would be forced to be written to the disk first. When Ext3 became the default file system developers came to rely on its behaviour.