Talend's Open Studio for Big Data moving to Apache licensing
Talend has announced that its new Talend Open Studio for Big Data will be released under an Apache Licence. Open Studio for Big Data includes a number of components, such as Hadoop connectors, which were previously only in the Enterprise edition of the software. Historically, Talend software has been licensed under the GPLv2; it is unclear how the licence change will affect the other Talend products.
The switch to the Apache Licence will allow Hortonworks, one of the companies that produces a distribution of the Apache Hadoop software, to include Talend's Open Studio for Big Data as part of a future Hadoop distribution. Talend's software includes visual configuration and development tools to make use of a Hadoop cluster; normally this is predominantly a command-line based exercise.
Talend's tools provide the ability to load the Hadoop system with data from a variety of data sources and use the cluster to ensure that there are no duplicates in the data set and to ensure the quality of the data. Deduplication can, says Talend's Ciaran Dynes, have a significant impact on the economics of running Hadoop; if ten per cent of the data is actually duplicated in a "big data" data set, then removing it could reduce the number of nodes needed to process the data.
Further details about the Big Data support are available on the Talend site. A beta version of the commercial release is currently available as a downloadable trial. When asked about source code availability Talend told The H that the switch to the Apache licence is ongoing, requiring work such as replacing copyright notices, and needing a new build system added to the TalendForge.org project hosting site. The company hopes to deliver the Apache licensed source in May, alongside the release of Talend 5.1.