Apache Lucene and Solr update with new default codec
The Apache Lucene project has announced Lucene and Solr 4.1, the latest updates to the Java-based text search library and search platform built around it. Lucene 4.1 has a new default codec "Lucene41Codec" which is based on a previously experimental "Block" indexing format. The new codec includes optimisations around pulsing (where a term only appears in one document) and efficient compressed stored fields to help keep data within the bounds of I/O cache.
The new codec also writes without seeking, allowing it to work with append-only streams or storage like HDFS. Other additions include a new
FuzzySuggester, a new
PostingsHighlighter and a new
CommonTermsQuery to speed up searches for very frequent terms. Further details are available in the release announcement.
Solr 4.1, being based on Lucene, incorporates all of these features internally and so benefits from the enhanced performance and capabilities of the new codec. The SolrCloud implementation in 4.1 has been enhanced with simple multi-tenancy support based on enhanced document routing and performance. It also includes changes such as short-circuiting distributed searches if they only need to query a single shard of the cloud. Solr itself has been improved so that all its features, including replication, work with custom Directory or DirectoryFactory implementations.
Improvements to Solr's request parsing mean it works out of the box with Tomcat and JBoss, and the new Lucene
PostingsHighlighter is also supported in Solr. The Solr Admin UI now supports Internet Explorer, is more readable and has a better interface for data import handling. Other enhancements and optimisations are listed in the release announcement.