In association with heise online

Integration with Java

Clojure is not a port of a standard Lisp dialect to the JVM – it has been designed from the ground up to make integration with the underlying Java libraries and other Java code easy. A Clojure program is able to create objects, call instance and class methods, and even create new classes that implement Java interfaces as required. Accessing existing Java libraries is not just permitted in Clojure, it is idiomatic. Conversely, Clojure code is easily embedded in Java programs.

In combination with JDBC (Java Database Connectivity), Servlet Engines (and other HTTP APIs), Eclipse SWT, and so on, Clojure becomes an extremely practical language. This is supplemented by what is now a comprehensive range of libraries, some implemented fully in Clojure, some as a wrapper around an existing Java library.

Clojure is compiled to bytecode, which requires the Clojure runtime libraries at runtime. The clojure.jar archive is approximately 1.5 MB in size. As well as Clojure compilers and a REPL (read-eval-print loop), it also includes persistent data structures and the extensive standard library. Thanks to Maven and Ant integration and IDE support in Emacs, Eclipse, NetBeans and IntelliJ IDEA, there is no impediment to using it within existing Java projects.

Parallel processing

Clojure is rendered particularly interesting by its support for parallel processing on multi-core platforms, indeed this is the feature that is causing developers to sit up and take notice of Clojure. The popular object-oriented programming (OOP) model was conceived at a time when multi-threading was simply not an issue; support for parallel processing in the form of locking strategies is consequently poorly integrated and feels 'bolted on'. The developers have designed Clojure for multi-core, multi-threading environments from the outset. This is expressed in particular in the explicit separation between identity and values, which, in the objects used in the OOP model, are closely coupled.

The concept of immutability as a design precept is also popular in the Java world – in Clojure it is the default setting. All Clojure values are immutable. Functions never change values; instead they create new values. An example is the function reverse, which reverses any sequence (e.g. a list, vector or string):

(reverse '(1 2 3 4 5)) 
»  (5 4 3 2 1) 

The returned result is, as expected, a list, but the initial object remains unchanged:

(def v1 '(1 2 3 4 5)) 
»  #'clojure.core/v1 

The same mechanism comes into play when lists, vectors or other data structures contain not five, but several million elements. From a logical point of view, this value is never changed, instead a copy is generated. This would be incredibly slow and impractical if it was really implemented in this way, but in fact the old and new data structures share common elements. This is known as structural sharing. The data structures used are called persistent data structures.

These data structures are at the heart of Clojure and ensure that immutability is implemented consistently. This in turn means that parallel threads in general do not need to be coordinated. If data never changes, there is no need to lock access to it. However, this applies to values only. But what if we aggregate information processed by multiple threads, for example in the form of a list in which elements are inserted in parallel? The solution lies in the fact that, although values never change, references do. Change is achieved by having a reference point to the new, modified value, instead of the old, original value. Clojure in turn offers a range of different synchronisation and isolation mechanisms for references.

Atoms and agents

What sound like components of an old cold war thriller are in fact mechanisms for coordinating parallel processing within Clojure. An atom is a reference which can only be changed atomically. It coordinates access from multiple, parallel threads. An atom is generated using the atom function and its initial value is modified using the swap! function:

(defn add-element [collection element] (conj collection element))
(def my-atom (atom []))
(swap! my-atom add-element 1)
(swap! my-atom add-element 2)
(swap! my-atom add-element 3)
»  [1 2 3] 

The swap! function is a higher order function, meaning that the first argument passed to it is a function. It calls this function with the current value of the atom and any other arguments being passed and it does so atomically. The collection can be read in parallel. This allows multiple threads in parallel to generate new values from the data structure to which the reference points. These values can then in turn be assigned to the reference. The deref function (abbreviated to @) dereferences the atom.

Agents are used to carry out asynchronous processing. Different threads can send messages to the same agent, which are then processed in a separate thread. (Using a ThreadPoolExecutor from the java.util.concurrent package.) Firstly, it implements a helper function to wait for a specified time, by calling the static method sleep from the java.lang.Thread Java class. The string following the function name is a comment to help the developer:

(defn sleep [time]
"Sleeps for the number of milliseconds defined by time"
(Thread/sleep time))

Elements can be added using the purpose-built function conj. This procedure is artificially slowed down by sleep:

(defn slow-add-element [collection element]
"Slowly add element to collection"
(sleep 100)
(conj collection element))

This is where the agent comes in:

(def my-agent (agent []))
(send my-agent slow-add-element 1)
(send my-agent slow-add-element 2)
(send my-agent slow-add-element 3)
(sleep 150)
@my-agent
»  [1 2 3] 
(sleep 250)
@my-agent
»  [1 2 3] 

Software transactional memory

Atoms and agents are always the right choice when you want to coordinate access to individual data structures – atoms for synchronous, agents for asynchronous access. The final significant synchronisation option relates to coordinated parallel access to multiple references. To this end, Clojure contains an implementation of the software transactional memory (STM) approach. As with database transactions, this allows changes to be made such that they are atomic (complete or not at all), isolated (visible to others only on completion), and consistent (but not durable, meaning that, of the ACID properties, all but the last are implemented). The use of STM is controversial, though the controversy primarily surrounds attempts to introduce such mechanisms into existing programming languages (e.g. C or C++). The situation with Clojure is a little different, as immutability means that concurrency is designed in from the start.

Ref objects are used for Clojure STM. They are created and changed in much the same way as atoms, but within the framework of a transaction defined by the dosync macro. The simplest way of illustrating this is via the traditional example of a transfer from one account to another. First, another helper function is defined:

(defn run-thread-fn [f]
"Runs function f in a new thread"
(.start (new Thread f)))

run-thread-fn makes use of the fact that all Clojure functions implement the Runnable interface. Three simple functions for managing an account are now defined (assoc 'changes' a value within a map):

(defn make-account [balance owner] {:balance balance, :owner owner})
(defn withdraw [account amount]
(assoc account :balance (- (account :balance) amount)))
(defn deposit [account amount]
(assoc account :balance (+ (account :balance) amount)))

Looked at from an OOP perspective, we might be expecting the definition of an 'account' class containing methods. In Clojure and other Lisp variants, the focus is on functions, which operate on simple data structures. It is normal to map many things which would, in Java for example, be implemented using classes, to maps, vectors and other Clojure data structures.

Using the make account function, we have now created a number of accounts which are encapsulated by a ref object:

(defn init-accounts []
(def acc1 (ref (make-account 1000 "alice")))
(def acc2 (ref (make-account 1000 "bob")))
(def acc3 (ref (make-account 1000 "charles"))))

Much like with atoms, changes to refs can only be made using a special function. For ref objects this is called alter:

(defn transfer [from to amount]
(dosync (alter from withdraw amount) (alter to deposit amount)))

The function we have defined transfers a sum of money from one account to another and is transactionally sound. Parallel threads only 'see' the changes at the end of the transaction. If an error occurs, the transaction is rolled back. Where there are competing writes from two transactions, correct serialisation is observed (and if need be one of the transactions repeated):

(init-accounts) 
»  #'clojure.core/acc3 
(run-thread-fn #(transfer acc1 acc2 100))
(run-thread-fn #(transfer acc3 acc1 400))
(list acc1 acc2 acc3)
»  (#<Ref {:balance 1300, :owner "alice"}>
#<Ref {:balance 1100, :owner "bob"}>
#<Ref {:balance 600, :owner "charles"}>)
(run-thread-fn #(transfer acc3 acc2 500))
(run-thread-fn #(transfer acc2 acc1 200))
(list acc1 acc2 acc3)
»  (#<Ref {:balance 1500, :owner "alice"}>
#<Ref {:balance 1400, :owner "bob"}>
#<Ref {:balance 100, :owner "charles"}>)

Next: Summary

Print Version | Permalink: http://h-online.com/-1626207
  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit
 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit