23 January 2013, 10:44

Apache Lucene and Solr update with new default codec

The Apache Lucene project has announced Lucene and Solr 4.1, the latest updates to the Java-based text search library and search platform built around it. Lucene 4.1 has a new default codec "Lucene41Codec" which is based on a previously experimental "Block" indexing format. The new codec includes optimisations around pulsing (where a term only appears in one document) and efficient compressed stored fields to help keep data within the bounds of I/O cache.

The new codec also writes without seeking, allowing it to work with append-only streams or storage like HDFS. Other additions include a new AnalyzingSuggester and FuzzySuggester, a new PostingsHighlighter and a new CommonTermsQuery to speed up searches for very frequent terms. Further details are available in the release announcement.

Apache Solr

Solr 4.1, being based on Lucene, incorporates all of these features internally and so benefits from the enhanced performance and capabilities of the new codec. The SolrCloud implementation in 4.1 has been enhanced with simple multi-tenancy support based on enhanced document routing and performance. It also includes changes such as short-circuiting distributed searches if they only need to query a single shard of the cloud. Solr itself has been improved so that all its features, including replication, work with custom Directory or DirectoryFactory implementations.

Improvements to Solr's request parsing mean it works out of the box with Tomcat and JBoss, and the new Lucene PostingsHighlighter is also supported in Solr. The Solr Admin UI now supports Internet Explorer, is more readable and has a better interface for data import handling. Other enhancements and optimisations are listed in the release announcement.

Lucene 4.1.0 is available to download from Apache Mirrors, as is Solr 4.1.0. Both are licensed under the Apache 2.0 Licence.

(djwm)

« previous | next »

Print Version | Send by email | Permalink: http://h-online.com/-1789767