Talend zooms in on big data with Open Studio 5.2
Talend has announced an update, version 5.2, to its suite of data, application and process integration programs that focuses its improvements on big data and NoSQL connectivity. Talend believes that data quality issues are endemic and especially so with big data systems; with that in mind the company has introduced data profiling to the Talend Platform for Big Data, which allows users to analyse data stored in Hive databases on Hadoop "in place" and without round trips to another system.
The analysis can also be scaled up with additional servers being used to boot performance. It produces a custom graphical report of the quality of the stored data according to a set of standard tests for empty/missing values, duplicates, length and shape of data and specific tests such as email and phone number validation. The test set can be customised and extended to provide more specialised checking, while the report will assist Talend users in making use of the products other facilities; data cleansing, enrichment, migration and synchronisation.
Talend 5.2's database connectivity has also been enhanced with support for Cassandra, HBase and MongoDB being added to the Platform for Big Data and Open Studio for Big Data. The new support joins the over 450 data connectors, including support for Hadoop Distributed File System, HCatalog, Hive, Oozie, Pig and Sqoop and other traditional databases and data resources, to allow for data integration processes, migration and synchronisation, to bridge organisational data flows between technologies.
The update also sees support for parallel job execution on multi-core hardware in the Enterprise Data Integration product, along with a continuous integration capability based on the open source Maven build manager. The continuous integration capability is also incorporated in Talend's Enterprise ESB product. Talend Enterprise Data Quality has had its address validation and fraud detection capabilities expanded and added support for users to validate addresses with Melissa Data. The Talend Enterprise MDM system can now use Oracle, MySQL, Derby or H2 databases as its underlying data store for master data management system.
Talend's products are available to download as open source "community" editions or as subscription-only enterprise editions. Talend uses the GPLv2 licence for its open source software. A community for the open source editions is based around the TalendForge site, which includes tutorials and other resources for making use of Talend's products. The open source versions of Talend 5.2 are available to download now, with the commercial subscription editions being made available by the end of the year.