In association with heise online

13 November 2006, 16:45

Christiane Rütten

Signed, sealed, protected

Backups on non-trusted FTP servers

Anyone storing data on an unfamiliar FTP server needs to encrypt and sign it to ensure reliable protection against prying eyes and external manipulation. duplicity is just the tool for this, and the ftplicity script from c't magazine makes working with it child's play.

Most providers supply their root server tenants with access to an FTP server for backup purposes. In general, that server has enough space for the entire contents of the hard drive. Yet FTP archive space provided in this way is not necessarily trustworthy: the user cannot be sure who has access to it, what happens to data stored there, and whether the server is truly secured against unauthorised access. Furthermore, under certain circumstances data can also be exposed to prying eyes during FTP transmission to the backup archive.

duplicity, a backup tool developed in the Python scripting language, helps Linux admins out of this bind: it creates signed, compressed and incremental backups encrypted with GPG. Here, incremental means that once a full backup has been completed, only those files that have been modified are backed up thereafter. That saves storage space and transfer volumes and allows for regression to older file versions. The use of GPG also allows backup data to be transferred confidently even over unsecured network connections and onto untrustworthy servers, because it is protected from prying eyes and manipulation.

duplicity version 0.4.1 is already part of Debian, Ubuntu (in the universe repository) and Fedora Core (in the "Extras" section). Suse users need to fetch it themselves: the current version (0.4.2) is available for download on the homepage [1] as a tar archive (and source RPM for Fedora). Once the tar archive has been unpacked, executing in the source directory the command:

python setup.py install

initiates all necessary installation steps. Development packages for the Python and librsync versions being used must already be installed.

Down to business

A full backup consists of GPG-encrypted tar archives in roughly 5MB volumes, known as difftars (duplicity-full.*.difftar.gpg). duplicity also stores a refined system of checksums for portions of the stored files (a "sigtar") and a table of contents ("manifest"), all of it encrypted.

image 1 [400 x 266 Pixel @ 17,1 KB]

To create an incremental backup, duplicity first fetches the sigtars off the FTP server from the last full backup, as well as from all subsequent incremental backups. Based on the checksums it finds there, it determines which files have been changed. That's not all: the rsync algorithm, which is also employed in the synchronisation tool of the same name, even recognises what has changed in a file and stores only the difference[2]. That functions similarly to the diff command for text files and helps keep data volumes as small as possible. The incremental backup is then sent -- in volumes where necessary -- into files called "duplicity-inc*", together with the related sigtar and manifest. Over time, backup and signature chains are created on the FTP server consisting of full backups with subsequent increments.

image 2 [400 x 185 Pixel @ 26,9 KB]

This splitting up of the volume pays for itself during the restoration of individual files. duplicity then only needs to re-order those files from the FTP archive required for the file reconstruction. The manifests keep track of just which ones those are. The rsync algorithm assembles the desired version of the file from the last full version and the differences. The entire file must be able to fit into the temporary directory in order for duplicity to perform this. In other words, to recreate a backup there must be sufficient space for the largest file in the backup.

Print Version | Permalink: http://h-online.com/-747191
  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit
 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit