Apache Tika 1.2 introduces new network server
The Apache Software Foundation has announced the release of version 1.2 of Apache Tika. The metadata and structured text content extractor started its life as a sub-project of Apache Lucene and was elevated to Top-Level Project status within the foundation in 2010.
Apache Tika 1.2 includes a new Java API for RESTful Web Services (JAX-RS) network server sub-module based on Apache CXF and new support for handling XMP metadata. New file formats added in this version include KML (Keyhole Markup Language), the XZ and Pack200 compression formats, improvements to the extraction of data from iWork files, the ability to detect FITS (Flexible Image Transport System) files and better extraction of resources from OLE2 Office Documents.
The new version also brings several other bug fixes and improvements; these include Tika's character encoding capabilities that have been enhanced by integrating the juniversalchardet library which implements Mozilla's universal charset detection algorithm.
A full list of changes for this release is available from the Apache Tika web site and the source code can be downloaded from the same location. Apache Tika is licensed under the Apache Licence Version 2.0.
- Apache Tika reaches 1.0, a report from The H.