Initial Explorations: Clojure
Clojure is a dialect of Lisp that is worth studying for its impressive range of features, its design, and the mindset that one that spends enough time in the community develops. We shall go over several small snippets of code to introduce and illustrate the language and some of its features.
It’ll be more fun to follow along if you have a Clojure REPL to play with.
Steps:
-
Install Leiningen
lein new <project-name>
cd <project-name>
-
`lein repl :headless :port 4005
-
Install CIDER, Paredit and clojure-mode
usingM-x package-install
and configure them.You can then do
M-x cider-connect
and then enter “127.0.0.1” for
the host and the same port number as above. <project-name>/src/<project-name>/core.clj
is where you can save your code.
Let’s start out by trying to write a function called length
which
gives us the count of the elements of a list. It’s the classic
recursive definition of length - “The length of an empty list is 0,
and the length of a non-empty list is one more than the length of its
tail”.
(defn length [l]
(if (empty? l)
0
(inc (length (rest l)))))
Any person who has attempted to read SICP will immediately know at
least one way to improve this code. A list of length n
will require
n
recursive calls plus the initial call, so therefore n + 1
stack
frames will have to be created. This is all very wasteful. We can
rewrite this in a tail recursive manner so that only one stack frame
will be used by keeping track of the count.
(defn length [l]
(letfn [(length* [count l]
(if (empty? l)
count
(recur (inc count) (rest l))))]
(length* 0 l)))
Okay, this looks significantly more complex. Let’s rewrite this in pseudo-Python to see what’s happening.
def length(l):
def length_helper(count, l):
if l.empty():
return count
else
return length_helper(count + 1, l.rest())
return length_helper(0, l)
This is an approximation of what the Clojure code does. We use recur
to take advantage of the Tail Call Optimization (TCO). It is possible
to run out of stack space in some situations if recur
is not used.
We use the variable count
to keep track of the state of the
computation between function calls. In other situations, to keep track
of more things, more arguments can be added.
A more familiar way of implementing length
would be something like
this:
def length(l):
count = 0
for elt in l:
count++
return count
What’s happening here is we are taking a region of storage called
count
(i.e. the variable count
) and repeatedly overwriting its
value. Mutation is discouraged in functional languages. Notice that in
the first two Clojure versions, we are not mutating anything. However,
this change of values of count
can be simulated by the additional
function argument trick. In the above Python code, the variable count
holds various values at different points in time, but in the
functional version, there are multiple invocations of the function
with various argument values. The data is repeatedly being passed from
one function call to another, but never modified in place.
We have to think about data flowing through abstractions that transform it (sometimes, gradually) into the solution. Contrast this with the imperative version where we use the same storage area and repeatedly keep overwriting it’s value.
FP Basics
map
Whenever we have a collection of data and we want to transform it to
another collection of the same size, we use map
. Here the word “map”
is used in the mathematical sense of “mapping” one value to
another. Say we have a collection of the first ten natural numbers. We
now map a squaring function over this collection to get a collection
of the squares of the first ten natural numbers. Every number in the
first collection has been mapped to it’s corresponding square in the
second collection.
explosure.core> (defn square [x]
(* x x))
#'explosure.core/square
explosure.core> (range 1 11)
(1 2 3 4 5 6 7 8 9 10)
explosure.core> (map square (range 1 11))
(1 4 9 16 25 36 49 64 81 100)
Normally we’d achieve this by looping over the collection, applying the squaring function on each element and then appending this value to a result collection.
result = []
for elt in range(1, 11):
result.append(square(elt))
Conceptually, we think of elt taking the value of each of the elements
from 1
to 10
in each iteration and then the operation being
performed on it. This is a common pattern - iterating over some
sequence of values.
It is simpler to think of all the iteration values as one collection and it is this collection that is mapped to another one by a function. For example, if you have a collection of word strings that need to be capitalised, don’t think of creating some iterative loop where a capitalisation function is called on each of these words, and the capitalised word appened to some accumulating result collection. Rather, think that you have a collection of words with you and you’ll create a mapping from this collection of words to another collection of capitalised words.
So, “perform X operation on every element in the collection” becomes “transform collection by X” or alternatively, “map X over the collection”.
lambdas
Lambdas are anonymous functions, and they are extensively used in Clojure code. Quite simply, we use lambdas where we need to use a function, and we can’t be bothered creating a named function. You can think of it as the body of the function definiton without the name.
;; Two ways of creating a lambda that squares its input
explosure.core> (map (fn [x] (* x x)) (range 1 11))
(1 4 9 16 25 36 49 64 81 100)
explosure.core> (map #(* % %) (range 1 11))
(1 4 9 16 25 36 49 64 81 100)
explosure.core> (macroexpand-1 '#(* % %))
(fn* [p1__1264#] (* p1__1264# p1__1264#))
;; p1__1264# is created by gensym, so the above piece of code isn't
;; too dissimilar from
(fn [x] (* x x))
The #
is a reader macro that transforms “(* % %)” into a valid
Clojure form similar to the explicit (fn [x] (* x x)) form that we
just saw. A simple way to remember its syntax is to take the body
of the lambda we need to write, in our case (* x x)
, and in place
of the variables, use %
. For functions with multiple arguments,
we can distinguish between them by adding a number like “%1” or
“%2” where the number designates the position of the argument.
explosure.core> ((fn [a b] (+ a b)) 3 4)
7
explosure.core> (#(+ %1 %2) 3 4)
7
The syntactic shortcut for creating such anonymous functions can help with readability when you don’t want to draw too much attention to the function, but sometimes it is also possible for it to hinder readability when the function is too complex. Discernment and good taste are essential to use it well.
One aspect of Clojure’s lambdas that I like is that they can be named as well. That’s right! An anonymous function that has a name! Let’s see how this could be used.
Let’s use the pedagogically most popular function to illustrate recursion - the factorial!.
(defn factorial [n]
(if (= n 0)
1
(* n (factorial (dec n)))))
(fn [n]
(if (= n 0)
1
(* n (??? (dec n)))))
;; ^-- we don't have a name to recursively call
The very essence of recursion is the self-referential definition. Let’s say we wanted to define factorial only using lambdas. If we don’t have a name to recursively call, how do we recurse?
There is an easy, convenient answer and it is by using named lambdas.
explosure.core> (fn factorial [n]
(if (= n 0)
1
(* n (factorial (dec n)))))
#<core$eval1370$factorial__1371 explosure.core$eval1370$factorial__1371@7fe85830>
explosure.core> ((fn factorial [n]
(if (= n 0)
1
(* n (factorial (dec n))))) 10)
3628800