Impala: first SQL query engine to reach GA status
Enterprise software company Cloudera has announced the General Availability (GA) release of Impala, its open source, interactive SQL query engine. Impala provides query and processing features for data that is stored directly in the Hadoop File System (HDFS) or in Hbase – the functionality it provides for Hadoop is similar to that of the Stinger Initiative from rival company Hortonworks or to that of Apache's Drill. The engine is designed to support a wide range of file and data formats, and most users should, therefore, be able to use it without having to make detours via proprietary formats.
With version 1.0, which arrives around six months after the project was first made publicly available as a beta, the developers have fixed various bugs and also introduced new features such as ALTER TABLE and REFRESH for individual tables and dynamic resource management capabilities. Impala's SQL dialect now supports query hints that allow users to fine-tune the inner workings of queries (for example [SHUFFLE] or [BROADCAST]). The developers say that query hints can be specified as a temporary workaround for "expensive" and inefficient queries that can occur, for example, when statistics are missing.
The Apache-licensed Impala is the first among the Hadoop-querying projects to release a completed version, which should give it an advantage when trying to attract a wider user base. Instructions for installing Impala are provided on the Cloudera web site.