Clojure, the REPL and test-driven development

I’ve been using Clojure for nearly a year now, and something strange has been happening… I still think unit-tesitng is extremely important, but for some reason I don’t seem to be writing the same number of tests any more. I’m ashamed to say it, but there it is. And it gets stranger – this new lower test count doesn’t seem to matter.

It seems to me that my Clojure code works right the first time more often than my Ruby or Java code ever did. And I seem to find less defects in the Clojure code over time, too.

This is not just a fanboy speaking, though I am a huge fan of Clojure. I think that the reasons I’m observing this is due to a an important characteristic of the language. Instead of just talking about it, let me first walk you through an example.

This is something I had to do recently – we wanted to build a kind of reverse index for an HBase table. The row ids of this table are time-stamps. The idea was that this “reverse index” would allow us to answer the question of what the first time-stamp for a given day was. In other words, we needed to convert a list of time-stamps into a lookup of day vs. the first time-stamp of that day. Eg.

Input:


[“112323123” “1231231231” “123123123” “ 1231231123” ....]
 

Output:


{“2009-07-01” “123123123”
 “2009-07-02” “123131213”
 “2009-07-03” “123123122”}
 

(Note: I plucked the numbers out of the air, they aren’t accurate. But the idea is that the input is a long stream of timestamps, and possibly hundreds could correspond to each day.)

So I get started… thinking to myself – I know how to convert a timestamp to a day. From there, it’s easy to write a function that returns a hash containing the day vs. timestamp (Since I already had a function day-for-timestamp, it was easy) –


(defn day-vs-timestamp [time-stamp]
  {(day-for-time-stamp time-stamp) time-stamp})

So now, all I have to do is map the above function across the input. This gives me a list of hash-maps, each with one key-value pair. To ensure that I’m doing this in order of oldest first, I sort the input as well. Inside of a let form, all of this looks like –


(let [all-pairs (map day-vs-timestamp (sort input-list))]

Now, I have this list of hashes, each with one key (the day) and one corresponding value (the time-stamp itself). I want to combine these into one single hash-map which would be the final answer. But I have to deal with the issue of duplicate keys – when I find a duplicate key, I want to keep the first value associated with the key since it would be the oldest.

Clojure has a merge-with function which does just this – it accepts a function with 2 arguments (which are the two values in case a duplicate key is found) and the returning value is used in the merged hash-map.


(apply merge-with #(first [%1 %2]) all-pairs)

That’s basically it.

Combining everything –


(defn day-vs-timestamp [time-stamp]
  {(day-for-time-stamp time-stamp) time-stamp})

(defn lookup-table [input-timestamps]
  (let [all-pairs (map day-vs-timestamp (sort input-list))]
    (apply merge-with #(first [%1 %2]) all-pairs)))

When I write code like this – I often ask myself, what exactly should I test? I end up writing a few happy path tests that prove my code works. And then a couple of tests that test border cases and negative paths. And I sometimes do it test first.

But the REPL has spoilt me. What I used TDD for when coding with Ruby (and still do), I often do at the REPL. I build tiny functions that work – these are often single lines of code. Then I combine these into other functions, often no more than two lines of code each, sometimes three. And it all just works – leaving me wondering what to cover with tests.

The main reason I still write tests is for regression – if something breaks in the future, I catch it quickly. However, the other thing – the test *driven* design aspect of TDD – has been somewhat replaced by the REPL. And its very much more dynamic than a set of static tests. It really brings out the rapid, in rapid application development – especially when combined with Emacs and SLIME.

One main difference with Clojure vs. Ruby (say) is that Clojure is functional (I use very little of Clojure’s constructs for state). And in the functional world, I just don’t have to worry about state (obviously), and this tremendously simplifies code. I think in terms of map, filter, reduce, some, every, merge, etc. and the actual logic is in tiny functions used from within these other higher level constructs. The idea of first-class functions is also key – I can build up the business logic by writing small functions that do a tiny thing each – and combine them using higher-order functions.

This is one reason why we’re so productive with Clojure. We’ve moved to Clojure for 90% of our work. That said, we still use Ruby for parts of our code-base, and it’s still my favorite imperative language 🙂

14 thoughts on “Clojure, the REPL and test-driven development

  1. I’ve had a similar experience coding in Clojure. In a nicely refactored, functional codebase, I’m finding very few bugs with my unit tests. But I still love having the safety net of (run-tests ‘my-pkg) before I check in. And the tests also serve as good documentation.

    Also, I also love the convenience of using partial maps during unit testing (if your function-under-test only relies on fields x and y but not z of a map, you can pass just {:x 1 :y 2} to your function).

  2. I have been learning Clojure for a brief while now – I have similar experience as yours. I hardly tend to write unit tests anymore with Clojure, though I would make it a point to write functional, integration and acceptance tests.

  3. I’m coming from exactly the opposite direction. I’ve been using Common Lisp for several years, using the REPL as one of my main development aids, using ilisp first and slime since it was the next hot way to go. So, for me it was and is naturally to employ the interactive shell that Python and Ruby provide to do what I’ve been doing with Lisp. I actually started writing unit tests in Ruby, out of the reason that I wanted *others* to be able to use them as a means of documentation and validating that their changes don’t break anything.

    The point I want to make is that there is no conflict between using the REPL and unit tests, quite to the contrary. You can easily write small functions for testing your interactively developed code using the REPL, but once you have your functions working you can just as easily drop them into your code base as unit tests. That’s a huge benefit of interactive development that you just don’t have with those edit-compile-debug languages.

  4. ; Personally I find it simpler to think about building the map up:
    ; assuming a pseudo timestamp where 25 means day 2, hour 5

    (defn day [timestamp]
    (quot timestamp 10))

    (defn day-table [timestamps]
    (reduce #(assoc %1 (day %2) %2)
    {}
    (sort > timestamps)))

    (day-table [10 15 13 21 24 25 36 33])

    user=> (day-table [10 15 13 21 24 25 36 33])
    {1 10, 2 21, 3 33}

    • To comment on testing specifically, I have to agree that Clojure has the allure of ‘it just works’ to the point I don’t like to think about testing. What I found though was that interactively I do a fair bit of testing and diagnostics (never get things right first time) which were being lost. Saving those sanity checks can be really handy later down the line.

      I like to think that a minimal amount of good tests is ideal for both developing and verifying. So to me the important thing to me is, does it show how to use the function or does it show a case that needs to be handled. If I write some code to achieve something I will at least have run it in some context at some time; I want to capture that. If I find a bug, I would prefer to keep that information rather than discard it.

      My work-flow to that end is roughly like this:
      1) open a file and write some high level wishful thinking declarations
      2) start writing some functions to support it
      2.5) copy the function definitions to REPL as I create them
      3) open another file and write some basic tests to check the function works
      3.5) copy these to the REPL
      4) iterate between the two buffers building the tests up to an example and the program toward an end goal. These eventually become similar activities and I know I’m done.
      5) wrap the secondary file up in deftest
      Now when I come back in a month, I can just run my tests and be happy! I guess the point of my comment is simply that saving your REPL buffers can provide some value with little effort.

      That’s just what works for me at present, and I’m happy to be educated to better approaches.

      • That’s very interesting. I think a lot of newcomers, both to Clojure and the REPL, will find that useful.

  5. It would be nice if there was a way to capture the interactive tests and then easily edit them into unit tests for later regression and continuous integration testing.

    In any case you’ll still need a good test suite for said regression and continuous integration tests so we can still make changes over time without fear of breaking things.

    • Yup, that setup took nearly a full mutine Thanks Charlie. I upgraded my main laptop distro to Linux Mint Debian Edition (XFCE) recently, so I had to setup Leinengen once again. The Debian (testing) repos have an older version of Leinengen/Clojure, so I like this manual method anyhow. In fact, you could cut it down to about 3 seconds with the following one-liner, for most Linux desktop peeps (just run command as local user from the directory you’d like to hold the lein script):wget output-document lein https://github.com/technomancy/leiningen/raw/stable/bin/lein && chmod +x lein && sudo ln -s ./lein /usr/local/bin/The /usr/local/bin/ directory is already on the PATH for most (all?) distros, so I just throw a symlink there pointing to wherever the lein script wants to live.Cheers,Jamie

  6. Ahh but whatever happened to “TDD is about *design* not testing” bit of agile dogma?

    Ok that wasn’t a serious question and not aimed at Amit – I’ve been hearing this from so called “agile” folks for some time now and couldn’t resist, but snark aside,

    TDD has always had a dual purpose – (1) as a design method – I’ve long been dubious of this and (2) as a way of building up a regression suite.

    (2) is still valuable, though you don’t really need the “DD” part of TDD to get the benefits. You can just save your REPL tests, whether written first, last or in the middle, or while bug fixing (as Amit points out).

    That said, I hereby take this opportunity spit on TDD as a design method. Referential Transparency helps in avoding the “DD” part of TDD, but even in OO/imperative designs, I’d rather substitute thinking for TDD. I’ve had to fix too many code bases “designed” through TDD.

    Wonder of wonders, a lot of people do good design by thinking (deeply) about the issue at hand without resorting to crutches like TDD. Rich Hickey is a great example.

    • “That said, I hereby take this opportunity spit on TDD as a design method………….. without resorting to crutches like TDD………” -> Shows you know jack about TDD.

  7. certainly like your web site however you have to check the spelling on quite a few of your posts.

    Many of them are rife with spelling problems and I to find it very troublesome to tell
    the reality however I will certainly come back again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s