In association with heise online

05 July 2012, 17:02

Clojure - A pragmatic Lisp dialect for the JVM

by Stefan Tilkov

One could get the impression that Java Virtual Machine (JVM)-based programming languages are cheaper by the dozen. JRuby, Groovy, Scala, Jython and even good old Java – they are all vying for the attention of programmers. While many thought that Scala was destined to emerge as the "winner", another language – Clojure – has been a successful competitor, at least in terms of publicity.

Clojure is a Lisp dialect that was specifically developed for the JVM and stands out because it supports the development of applications for multi-core platforms. Most people's notion of Lisp is of a language with countless brackets and no practical relevance whatsoever, used mainly to torture IT students past and present. Lisp has had this negative reputation for several decades, and the rather fragmented, and occasionally far from humble, Lisp community has so far not managed to change this. Clojure is different mainly because of one particular aspect: it is practical. Therefore, overcoming any initial reluctance to grapple with the syntax is worthwhile: there is a good reason for this syntax, and the author of this article, at least, has now not only become used to it, but also considers it the most elegant variant possible.

Data structures and syntax

In addition to literals for numbers, strings and regular expressions, Clojure's most important supported data types include lists, vectors, sets and associative arrays (maps). Lists are the most important elements in all Lisp languages, and Clojure doesn't differ from its predecessors in this respect (although vectors also play a major role, as will be demonstrated below). Lists are represented as bracketed expressions, may contain elements of any type, and can be nested:

'(1 2 3)
'(1 "a string" 2.45)
'(1 2 ("a" :b "c") ("d" "e" (:x "y")))

Vectors provide efficient access to items by index and are enclosed in square brackets. Maps are enclosed in {...}, and sets in #{...}:

[1 2 3]
[1 2 [4 5 6] ["a" "b"]]
{1 "One", 2 "Two"}
#{1 2 3 4}

Clojure

Clojure 1.0 was released in May 2009, and the current stable version 1.4 became available in April 2012.

All versions have in common that they have mainly offered extensions: Apart from changes to some of the namespaces, new versions have generally had no significant impact on the already existing code. Clojure 1.2 mainly offers performance optimisations and extensions that aim at fully implementing Clojure in Clojure at some point in the future.

When giving a presentation, Rich Hickey, the father of Clojure, likes to follow the slide on data types with one called "Syntax" and then say: "You've just seen it!" – and, indeed, the language contains almost no other elements. Clojure code is contained within the data structures, particularly in lists and vectors – ultimately, code and data are one and the same. This particular characteristic plays an important role when automating recurring patterns: Lisp doesn't require an external mechanism such as a code generator for this task, as it includes the necessary components as part of the language itself – more on this later.

As Clojure is a functional programming language, its most important code construct is the function call. For this purpose, a Lisp-based language interprets the first element in each list as a function (or, to be more precise, as something that evaluates to a function), and the remaining elements as its parameters. For example, the str function creates a string from its parameters. A function call looks like the following (the result is displayed after the » symbol on the next line):

(str "a" "b" "c") 
»  "abc"

To suppress the evaluation of a list expression, programmers use the "quote" function or its abbreviation, ' (a single quotation mark):

'(str "a" "b" "c") 
»  (str "a" "b" "c") 

Function names may contain almost any characters, often include a variable number of parameters and can be nested as required. This eliminates the need for special operator rules. The following example demonstrates how '*', '+' and '-' functions are used:

(str "The result is: " (* (+ 2 4) (- 10 8))) 
»  "The result is: 12"

The syntax of function definitions demonstrates the close relationship between Clojure's data structures and its code: a function definition is a list that consists of the defn key word, a name and a vector that contains the parameters, followed by the implementation itself (which is another list that is also governed by the rules mentioned above):

(defn multiply [a b] (* a b))
(multiply 3 6)
»  18 

Strictly speaking, defn isn't actually a keyword: It is a macro, another peculiarity of Lisp-based languages. The multiply function can also be defined by assigning a function to a variable:

(def multiply (fn [a b] (* a b))) 

Although almost all of Lisp's language constructs are functions themselves, there is a limit, a small language core that consists of what are called "special forms". Among these special forms are def and fn: the former defines a new variable, the latter a new (anonymous) function.

As suggested by its name, defn combines the def and fn steps – you define a function and then name it all in one go. It comes as no surprise that the developer of a programming language would incorporate such a structure; however, what's special about Lisp dialects such as Clojure is that such simplifications can be created not only by modifying the language core, but also via the language components. This is where macros come in: a macro is a code segment that writes code. Even defn is implemented in this way. The result of the macro can be displayed via the macroexpand function:

(macroexpand '(defn multiply [a b] (* a b))) 
»  (def multiply (clojure.core/fn ([a b] (* a b))))

The clojure.core/ element in front of fn designates the namespace in which the fn is defined. In other programming languages, this kind of automation would require an additional code generator, but in Lisp this mechanism is part of the language. A (very simplified) version of the defn macro could be defined as follows:

(defmacro defn2 [name & fdecl] (list 'def name (cons 'fn fdecl)))

The list function generates a new list of individual elements, and cons adds an element to the beginning of a list – therefore, the macro uses its arguments to create a new data structure which contains the code it is designed to generate. (In practice, programmers use syntax-quote, a kind of templating mechanism.) A macro behaves in a similar way to a function, but it is evaluated during compilation, not while the program is being executed. To check the result we can use macroexpand as before:

(macroexpand '(defn2 multiply [a b] (* a b))) 
»  (def multiply (fn [a b] (* a b))) 

The relationships between macros, functions and special forms can also be demonstrated using control structures. For aspects such as a condition, other languages require a separate keyword; in Clojure and other Lisp-based dialects, a condition looks just like a function call. This can be illustrated by the following simple function:

(defn equals3 [v]
(if (== v 3)
(str v " is divisible by 3")
(str v " is not divisible by 3")))
(equals3 3)
"3 is divisible by 3"
(equals3 4)
"4 is not divisible by 3"

Macros are among the most interesting characteristics of Lisp-based and Lisp-inspired languages. If equivalent components are unavailable (for example in Java), developers tend to use external code generators.

Another essential part of understanding the special characteristics of Lisp are higher-order functions (HOFs) – functions in which one or several parameters can be functions themselves. An example of such a function is map:. This function transforms a list into a new list by applying a function, submitted as a parameter, to each element:

(defn double-number [v] (* 2 v))
(map double-number '(1 2 3 4 5))
»  (2 4 6 8 10) 

A slightly more complex function is shown below. It returns "FizzBuzz" for numbers that can be divided by 3 and 5, "Fizz" for numbers that are divisible by 3, "Buzz" for numbers that are divisible by 5, and the number itself for any other number:

(defn fizzbuzz [n]
(cond
(and (= (rem n 3) 0) (= (rem n 5) 0)) "FizzBuzz"
(= (rem n 3) 0) "Fizz"
(= (rem n 5) 0) "Buzz"
:else n))

The fizzbuzz example demonstrates another Clojure mechanism: infinite sequences that can only be evaluated at a later stage ("lazy"). For example, (iterate inc 1) returns a sequence that calls the inc (increment) function in every step and submits the most recent value each time (starting at 1).

The result is the set of natural numbers. As it wouldn't be such a good idea to try and process the whole set, the take function helps by calling the first n elements of such a sequence. Combining this example with map allows us to generate an elegant fizzbuzz sequence for the numbers from 1 to 25:

(map fizzbuzz (take 25 (iterate inc 1))) 
»  (1
2
"Fizz"
4
"Buzz"
"Fizz"
7
8
"Fizz"
"Buzz"
11
"Fizz"
13
14
"FizzBuzz"
16
17
"Fizz"
19
"Buzz"
"Fizz"
22
23
"Fizz"
"Buzz")

Next: Integration with Java

Print Version | Permalink: http://h-online.com/-1626207
  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit
 


  • July's Community Calendar





The H Open

The H Security

The H Developer

The H Internet Toolkit