Happenings: NoSQL Conference, Berlin
by Isabel Drost and Jan Lehnard
More and more web applications have data storage requirements that can't be fully met using traditional relational databases. Object-oriented and document-oriented databases provide an alternative – and a field of development that has recently seen much activity.
For decades, relational databases have provided the basic structure for the persistent storage of all data types in many applications. The underlying data models (schemata) with their explicit constraints (predetermined value ranges for variables) provide the required data consistency; transactions and locking ensure atomic access and provide easy multiple user access. In addition, SQL is a flexible query language for examining the most diverse contexts, especially in the data mining area.
Source: Alexander Lang However, an increasing number of developed applications, while requiring persistent data storage, are prepared to trade in many of the guarantees offered by relational database management systems (RDBMS) for an increase in performance. A typical CMS which needs a back end for its future content, for example, will rarely require a flexible query language – the respective queries are already determined during development and seldom altered at a later stage. More important than flexible queries is the database's responsiveness to these few predetermined queries, because the queries will be requested thousands of times every day. Web shops with a high number of visitors, like Amazon, will gladly forego the locking mechanisms of the DBMS and implement conflict resolution in the client code, if this will noticeably increase their throughput.
More than 20 years ago, the development of object-oriented databases already created an alternative data storage solution that is closer to object-oriented programming languages than to a rigid, inflexible table schema. Recently, the development of "non-relational" databases has picked up momentum. These databases usually focus on high data processing throughputs and are optimised for highly parallel, globally distributed access scenarios. Such databases tend to be schema-free: The data is stored in key-value pairs instead of predetermined tables, which means that data structures can be changed with considerably less effort.
On the 22nd of October, the first NoSQL conference about non-relational databases in Europe was held in Berlin. More than 70 developers, users and NoSQL enthusiasts including corporate representatives (Xing, Peritor, nugg.ad, StudiVZ) and academic researchers (Zuse Institute Berlin; Beuth University of Applied Sciences, Berlin) met at the newthinking store in Tucholskystraße, where six presentations outlined the current NoSQL project developments.
Source: Alexander Lang The first presentation was given by Monika Moser, who developed the transactional storage layer of Scalaris and is now an Erlang/Ruby/Hadoop developer at nugg.ad. In her presentation, the developer analysed the topic of consistency in key-value stores. Brewer's CAP theorem stipulates that only two of the three properties of Consistency, Availability and Partition Tolerance can be guaranteed in distributed systems. According to Werner Vogels (Amazon), the agreement protocols that provide consistency in highly scalable systems sooner or later become bottlenecks. Moser presented Paxos, a fault-tolerant, scalable agreement protocol used in many key-value stores.
Source: Alexander Lang Next, Mathias Meyer (Chief Cloud Officer at Peritor Berlin) introduced Redis, a key-value store that dispenses with ACID guarantees in order to focus on speed and throughput. Redis uses a text-based protocol. The database supports basic data types like strings, lists and sets as well as atomic operations on their respective content. Data is kept in memory, but can also be stored persistently in a dump file. The storage dumps are time-dependent and can be adjusted according to the number of changes made in the database.
Redis can be scaled vertically via a master-slave configuration; horizontal scaling is possible via additional libraries. According to Mathias Meyer, Redis is mainly suitable for caching, for worker queues and for storing statistical data. Redis is unsuitable for scenarios in which the data does not fit into the main memory.
Source: Alexander Lang After a short break, Jan Lehnardt presented CouchDB, a database which is classified as document-oriented and was recently released as a beta version. Documents are stored in CouchDB JSON format. Data access is handled via a RESTful HTTP API. CouchDB is a schema-free database: This makes it easy to add extra fields to documents and efficiently store documents where not all the entries contain data. CouchDB documents are queried using a Map/Reduce API similar to those of functional programming languages or Hadoop and Google.
CouchDB implements optimistic locking: A user wanting to edit a document receives both the document and the document's revision number at the time of the query. This revision number must be returned when writing the new document. The database uses this information to ensure that there is no accidental overwriting of data which was edited simultaneously. This mechanism usually performs much better than traditional locking procedures, and in the worst case it is as slow as a pessimistic locking implementation that locks the document before it can be edited. Inspired by Lotus Notes, CouchDB's replication feature allows developers to build both highly available, distributed cluster solutions and distributed offline web applications that give users the ultimate control over their personal data – CouchDB is the structural basis of implementations like Ubuntu One, an online storage facility for Ubuntu users.
Source: Alexander Lang Martin Scholl (of German IT specialist global infinity) provided an overview of the Riak database, a distributed key-value store with a HTTP/REST API. Documents are usually stored in JSON format and organised according to keys and buckets (document namespaces). Riak supports Map/Reduce for data queries and processing. Map/Reduce jobs are processed in parallel and can be interlinked. Riak also supports data distribution across clusters consisting of multiple physical machines. The implementation of the distribution feature is similar to that of Amazon Dynamo.
The conference was closed by Stefan Edlich, a professor at Berlin's Beuth University of Applied Sciences, who presented an overview of the current object-oriented databases and their variety of uses. After the official presentations, the conference participants retired to Cafe Aufsturz, where developers and users clarified many a question while chatting over a drink.
Those who wish to learn more about high performance data storage are cordially invited to attend the Apache Con US conference held in Oakland from the 2nd to the 8th of November. In addition to training sessions and presentations about topics like CouchDB, Hadoop, Solr and Hbase, the conference will also host the next NoSQL meet up.
Videos and slides from the presentations are now available on the nosqlberlin.de web site.