Google releases open source code for hash functions
Google has announced the availability of two open source hash functions for producing 64- and 128-bit hash values from strings. The company states that these functions are suitable for such applications as hash tables – eg. indexing a database – but are not suitable for cryptography.
Google says that although it initially optimised these functions for use on the CPUs most commonly used inside Google, it turns out that the key features needed are common in most modern PCs and laptops. These include 64-bit registers, fast memory access for unaligned data and also instruction level parallelism – the approach taken by Google in the code involves steps with pairs of mathematical instructions that are independent, neither relying on the results of the other, and can therefore be executed in parallel.
The company points out one drawback from its approach, in that the code is more complex than might otherwise be expected, as the focus was on optimising for speed rather than simplicity. It expects that these two functions will improve on the performance of existing examples by between 30 and 100 per cent. A comment in the source code states that the 128-bit version is optimised for long strings, and is likely to be faster than the 64-bit version for strings longer than 2,000 bytes.
The source code for both 64- and 128-bit versions of CityHash is available to download from the project's Google Code page. It comes in the form of two files, one C++ and one header file, both under the terms of the MIT licence.