Microsoft SQL Server connects with Hadoop
Microsoft has announced two Community Technology Previews (CTP) of connectors for the open source map/reduce platform Hadoop. The Hadoop technology is developed under the umbrella of the Apache Software Foundation for managing and analysing large amounts of data – 'Big Data'. The connectors enable the transfer of data in both directions between Hadoop and both of Microsoft's SQL Server 2008 R2 and SQL Server Parallel Data Warehouse (PDW).
The connectors allows large amounts of both structured and unstructured data to be handled by Hadoop using Microsoft's database systems. For example, customers can analyse unstructured data in Hadoop and integrate the findings drawn from it into the data warehouse product from Microsoft. The connectors use SQOOP (Hadoop to SQL) to handle data transfer between the Hadoop File System (HDFS) and the relational databases. The PDW connector also uses the high performance PDW Bulk Load/Extract tool for import and export.
According to the announcement, the SQL Server connector is compatible with Denali; this is the code name for Microsoft's next version of SQL Server, which is currently in a third Community Technology Preview. Customers can request a copy of the test version of the connector for PDW from Microsoft’s Customer Support Service; for the connector for SQL Server there is already a download page.
Hadoop is a Java framework for parallel data storage and processing of very large amounts of data stored across clusters. The basic idea of the framework goes back to Google's MapReduce technology. In March this year, The Guardian newspaper awarded Apache Hadoop the title "innovator of the year " award in its "Media Guardian Innovation Awards".