In association with heise online

17 August 2009, 09:48

Kernel Log – Coming in 2.6.31 - Part 3: Storage and file systems

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

kernellogpenguin.jpg

by Thorsten Leemhuis

The experimental file system Btrfs, billed as the "next generation file system for Linux", should now be even faster. Libata drivers for IDE/PATA adaptors are pushing aside the IDE subsystem. The first components for defragmenting Ext4 file systems have been merged into the main development tree. Systems with Intel ATA chipsets now boot faster thanks to parallel hardware scanning.

Last Thursday, Linus Torvalds released the sixth pre-release version of Linux 2.6.31. As usual at this late phase of the development cycle, most of the changes from rc5 are minor. In his release email, Torvalds indicates that he expects 2.6.31 to be complete after the eighth pre-release version, probably in two to three weeks.

The Kernel Log is taking the opportunity to continue its series of reports on the major changes in Linux 2.6.31 compared to the current 2.6.30 kernel with an overview of storage and file systems. Previously, we have looked at the areas of networking and graphics, audio and video.

Btrfs upgraded

Major changes, in the form of a 350 KB patch, should significantly improve the experimental file system's performance by using 'mixed back references' in many areas. The patch does, however, involve a change in the structure of the file system on the storage media ('on disk format'). Kernel versions containing the new Btrfs code deal with the requisite conversion from old to new format automatically the first time the file system is mounted. However, Linux versions with older Btrfs code will thereafter no longer be able to mount file systems which have been modified by the new code.

This is flagged up clearly in the commit comments and Git pull request. Kernel developers usually try to avoid such situations, even with experimental file systems – the result of this change is that hardy users who choose to use Btrfs as their root file system will find themselves unable to start older kernel versions to deal with errors should the need arise. Indeed this is precisely the misfortune that befell Linus Torvalds, who was distinctly unimpressed.

Very late in the development cycle, via a Git pull request, the Btrfs maintainer sent through a number of major changes, work on which has been ongoing for some time. These should make Btrfs less memory hungry during long periods of high load (e.g. see 1, 2). Btrfs developers have also improved support for using SSDs (e.g. see 1, 2).

For other articles on 2.6.31 and links to the rest of the "Coming in 2.6.31" series, see The H's Kernel Log - 2.6.31 Tracking page.

Adieu IDE

Kernel developer David Miller, known for his work as maintainer of the networking subsystem and for SPARC support, has now also taken over the IDE subsystem from Bartlomiej Zolnierkiewicz. The reason for this was a clash over a bug occurring on SPARC systems, in the course of which Miller suggested that Zolnierkiewicz had introduced a number of changes to the IDE subsystem without adequate testing.

Miller has intimated that he is not planning on implementing any major changes to the IDE subsystem, "I'm going to treat IDE as pure legacy." The future therefore now looks definitively to belong to the PATA drivers in the Libata subsystem, which were merged into Linux 2.6.19 in late 2006. While they may not be able to control quite as many IDE/PATA adaptors as the IDE subsystem, they are able to deal with almost all common modern adaptors.

Many developers always viewed the Libata drivers as a replacement for the drivers in the older IDE subsystem, which has been a source of repeated strife between kernel developers for more than a decade. Following a period of almost complete inactivity, over the last eighteen months to two years Zolnierkiewicz had substantially revised the IDE subsystem and added a number of new drivers, so that instead of the anticipated lingering death, the two systems had started to look like competitors - a situation that now finds itself resolved.

Body search

The new Fsnotify replaces Dnotify and Inotify and can be used to monitor changes to the file system, such as creation, deletion or modification of files (1, 2, 3). The actual goal of Red Hat employee Eric Paris, who developed Fsnotify, is Fanotify, which builds on Fsnotify and offers virus and malware scanners operating in userspace a handle for checking files for malware before they are actually accessed. Paris recently invited discussion on the concept and design of Fanotify.

The idea arose from long discussions on TALPA, which set out to achieve the same purpose, but failed to win over kernel developers. Background information can be found in the LWN.net articles "Kernel-based malware scanning", "The TALPA molehill" and "The fanotify API".

In Brief

The changes described above are just some of the more significant of those recently undertaken by kernel hackers in the file system and storage field. A short overview of further changes:

File systems:

  • The Ext4 file system now contains code for de-fragmenting the file system while in use (online defrag). This is not, however, finished, as was recently emphasised elsewhere by Ext file system developer Theodore Tso (tytso). He goes on to say that further patches for this function still need to be evaluated and that there is still outstanding work to be done on the associated userspace program.
  • Support for NFS 4.1 has been extended, but further changes are still planned for 2.6.32.

Storage:

  • Thanks to a change merged prior to the handover from Zolnierkiewicz to Miller, the IDE subsystem now by default respects HPAs (Host Protected Areas) – users who deploy an HPA and are still using IDE subsystem drivers should not be surprised if their drive is a little smaller under 2.6.31.
  • IDE/PATA driver ata_piix for Intel controllers, which forms part of the Libata subsystem, now scans for drives in parallel, halving initialisation time on the developer's Eee PC.
  • As a result of one of many changes to the block layer, the latter now exports information on I/O topology using data supplied by the SCSI subsystem – this includes a drive's physical sector size. This is, for example, of interest when allocating storage media with sector sizes other than 512 bytes or for optimal arrangement of data in RAID arrays. The developer behind this code explores some of the issues involved in a recently released presentationPDF (see pages 235-238). The MD code responsible for software RAIDs is already able to make use of this topology information.
  • There have been major improvements to barrier support in the device mapper (delay, mpath, snapshot).
  • Following the merger of generic support for OSDs (Object-Based Storage Devices) and a file system based on them into 2.6.30, kernel hackers have now merged the osdblk driver, which allows OSD objects to be used as block devices.
  • An Emulex developer has contributed a nearly 340 KB patch which adds support for recent Emulex FightPulse fibre channel host adaptors to the lpfc (Light Pulse Fibre Channel) driver; this was followed by a further update which adds support for target reset handler entry points. The same programmer is also responsible for FC (FibreChannel) pass-thru support.
  • There's a new iSCSI driver for Broadcom's BNX2 chips: bnx2i. It can, if required, operate in conjunction with the new Cnic driver, which has previously been mentioned in the Kernel Log article on changes in the networking field.

Minor gems

Many additional minor, but by no means insignificant, changes can be found in the list below. Like many of the references in the text above, the links lead to the relevant commits in the web front end of the main Linux development branch, where the commit comments and the patches themselves provide extensive further information on the respective changes.

File systems

Btrfs:

CIFS:

Ext[2,3&4]:

NFS:

Various:

Storage

Block Layer:

Device Mapper:

Libata:

MMC:

MTD:

SCSI:

Various:

For other articles on 2.6.31 and links to the rest of the "Coming in 2.6.31" series, see The H's Kernel Log - 2.6.31 Tracking page.

(djwm)

Print Version | Send by email | Permalink: http://h-online.com/-742967
 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit