03 November 2009, 12:15

ZFS with data deduplication

As noted by ZFS developer Jeff Bonwick in his blog, ZFS now offers a data deduplication mechanism (deduplication being the process of removing or avoiding duplicate copies). To deduplicate data, the file system adds a SHA256 checksum to every data block. Data blocks with identical checksums will only be written to disk once and by using reference counters, different files containing that block will have references to the duplicated blocks. This is mainly beneficial when most of the files to be stored are identical, for example when the images of virtual machines with identical guest operating systems are stored on the file system. The process only requires a relatively small amount of extra administration effort, as ZFS already generates a checksum to ensure the integrity of each data block anyway.

According to Bonwick, a SHA256 hash collision (different data blocks are given the same checksum) is 50 orders of magnitude less likely than an uncorrected hardware error. Nevertheless, ZFS offers the option of comparing the data within blocks as well as the comparing hashes of the blocks. In this case, the developer recommends using a less performance-hungry hash algorithm.

ZFS is a file system designed by Sun Microsystems. ZFS originally stood for 'Zettabyte File System'.

(djwm)

« previous | next »

Print Version | Send by email | Permalink: http://h-online.com/-848638