Why Java programmers have an advantage when learning Clojure

Cross-posted from Zolo Labs.

There is a spectrum of productivity when it comes to programming languages. I don’t really care to argue how much more productive dynamic languages are… but for those who buy that premise and want to learn a hyper-productive language, Clojure is a good choice. And for someone who has a Java background, the choice Clojure becomes the best one. Here’s why:

  • Knowing Java – obviously useful: class-paths, class loaders, constructors, methods, static methods, standard libraries, jar files, etc. etc.
  • Understanding of the JVM – heap, garbage collection, perm-gen space, debugging, profiling, performance tuning, etc.
  • The Java library ecosystem – what logging framework to use? what web-server? database drivers? And on and on….
  • The Maven situation – sometimes you have to know what’s going on underneath lein
  • Understanding of how to structure large code-bases – Clojure codebases also grow
  • OO Analysis and Design – similar to figuring out what functions go where

I’m sure there’s a lot more here, and I’ll elaborate on a few of these in future blog posts.

I’ve not used Java itself in a fairly long time (we’re using Clojure for Zolodeck). Even so, I’m getting a bit tired of some folks looking down on Java devs, when I’ve seen so many Clojure programmers struggle from not understanding the Java landscape.

So, hey Java Devs! Given that there are so many good reasons to learn Clojure – it’s a modern LISP with a full macro system, it’s a functional programming language, it has concurrency semantics, it sits on the JVM and has access to all those libraries, it makes a lot of sense for you to look at it. And if you’re already looking at something more dynamic than Java itself (say Groovy, or JRuby, or something similar), why not just take that extra step to something truly amazing? Especially when you have such an incredible advantage (your knowledge of the Java ecosystem) on your side already?

Clojure: understanding dynamic vars and laziness

I spent a frustrating day figuring out why something wasn’t working as expected in a particular situation, even thought it was working just fine elsewhere. It turned out the problem was my flawed understanding of how Clojure’s dynamic vars behaved with respect to laziness. Thanks to Chouser and Chousuke on the #clojure channel in IRC, I think I have a slightly better grasp of the thing.

Update: Here’s a summary for those who don’t want to read the whole thing. What was happening is this – my assumption was that dynamic vars get captured in lazy objects – just like locals get captured in closures. They don’t – so if you later rebind the vars, when the lazy objects are realized, the vars will evaluate according to the latest binding. To baaad results 🙂

Here’s the chat transcript if you want to read the whole thing – it might help:

[10:24am] erohtar: chouser: u there?

[10:24am] Chouser: yep

[10:24am] erohtar: chouser: so i was thinking about our conversation yesterday about vars

[10:24am] erohtar: chouser: so vars in clojure arent the same as dynamically scoped variables in common lisp?

[10:25am] Chouser: I don’t know CL well enough to say for sure, but my understanding is that they’re pretty much equivalent.

[10:26am] Chouser: a possible difference being how threads interact with them — not sure what CL does there.

[10:26am] erohtar: chouser: the thing is… in common lisp, if i bing var-a to x, any thing down the call stack will see x…

[10:27am] erohtar: chouser: in clojure, it doesnt seem to behave like a closure… (laziness, during realization) is that correct? if var-a was bound to y elsewhere, realization of the old lazy stuff would now see y

[10:28am] erohtar: chouser: is that right?

[10:28am] Chouser: CL doesn’t do lazy seqs, right? so lets ignore that for now

[10:29am] erohtar: chouser: ok…

[10:29am] Chouser: let’s just look at how a closure works.

[10:30am] erohtar: chouser: i think i understand how a closure works, but when a closure is going to be evaluated lazily… what happens to a var that has since been rebound?

[10:31am] Chouser: (def var-a 2), (let [add (binding [var-a 5] (fn [n] (+ var-a n)))] (add 3))

[10:31am] Chouser: ok, here’s a little example that I would expect to work the same if translated to CL

[10:31am] Chouser: var-a has a root binding of 2.

[10:31am] erohtar: chouser: that should eval to 8, yes?

[10:31am] Chouser: no

[10:32am] erohtar: chouser: hmmm

[10:32am] Chouser: within the dynamic scope of the binding, a closure is created that refers to var-a

[10:32am] erohtar: chouser: ok… and that is 5?

[10:33am] Chousuke: nah, it’s var-a

[10:33am] erohtar: oh

[10:33am] Chouser: at the moment the closure is created, var-a has a thread-local binding of 5, but that doesn’t really matter because nothing is asking for var-a’s *value*, just grabbing var-a itself.

[10:33am] erohtar: i see… ok…

[10:34am] Chouser: so that closure is returned to outside the binding, and named ‘add’

[10:34am] Chousuke: I guess it’s a bit surprising though.

[10:34am] Chouser: and what does the closure have bundled inside? it knows it wants one arg n, and it has references to the vars var-a and +

[10:35am] erohtar: and it dynamically uses the value 2, for var-a when it needs it?

[10:35am] Chouser: right

[10:35am] erohtar: arent closures supposed to capture their environment?

[10:35am] Chouser: their lexical environment, yes.

[10:35am] erohtar: not the values?

[10:36am] Chouser: which is why this works: (def var-a 2), (let [m 7, add (binding [var-a 5] (fn [] (+ var-a m)))] (add))

[10:37am] Chouser: hm, not that’s not quite right, since m is still available when ‘add’ is called. But anyway, it wouldn’t have to be.

[10:38am] erohtar: thinking about this

[10:38am] Chouser: if you think about how dynamic vars are used in clojure.core, you’ll see it must be this way.

[10:38am] Chouser: let’s look at *in*

[10:38am] Chouser: er, *out*

[10:38am] Chousuke: ,(let [x 1 add (fn [] (+ 4 x))] [(binding [x 2] (add)) (binding [+ -] (add))])

[10:38am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context

[10:38am] Chousuke: hmm

[10:39am] erohtar: ok – talking about *out* – is that the std-out ?

[10:39am] Chouser: somewhere there’s something like (def *out* System/out)

[10:39am] Chouser: right

[10:39am] Chousuke: ,(let [x 1] (let [add (fn [] (+ 4 x))] [(binding [x 2] (add)) (binding [+ -] (add))]))

[10:39am] erohtar: right

[10:39am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context

[10:39am] Chouser: that’s the root binding of *out*

[10:39am] erohtar: ok, with u so far

[10:39am] Chousuke: hmm, I guess binding let-bound variables does not work

[10:40am] Chouser: then you’ve got functions like prn that use *out*

[10:40am] erohtar: yes… i think they need to be vars

[10:40am] erohtar: right

[10:40am] kotarak: let-bound locals are not Vars

[10:40am] Chousuke: hmm

[10:40am] Chousuke: ,(let [x 1] #’x)

[10:40am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context

[10:41am] Chouser: we can pretend prn is defined something like (defn prn [& args] (.write *out* args))

[10:41am] erohtar: when u create a lazy something, that uses prn, what happens?

[10:41am] Chousuke: indeed.

[10:41am] Chouser: now, if prn took the value of the *out* when it was defined, there would be no way to get prn to write to any other stream — it would always use the root binding, since that’s what was in play when prn was defined.

[10:42am] Chouser: but that’s not how it works.

[10:42am] Chouser: prn has a reference to the Var *out*, not its value.

[10:42am] erohtar: so something like – (map #(prn %)
[1 2 3])

[10:42am] erohtar: chouser: i completely agree

[10:42am] Chouser: so instead, prn will use the dynamic value of *out* when you call it.

[10:42am] erohtar: chouser: hmmm

[10:42am] Chouser: ,(with-out-str (prn 5 10))

[10:42am] clojurebot: “5 10n”

[10:43am] Chouser: got it?

[10:43am] erohtar: i do

[10:43am] erohtar: im thinking about my confusion – and trying to figure out the question to ask next

[10:44am] erohtar: so lets take my example –

[10:44am] erohtar: (map #(prn %)
[1 2 3])

[10:45am] erohtar: now, somewhere else, i rebind *out* to str… and in there, if i use the above object, will it also use the new binding?

[10:45am] erohtar: it will…

[10:45am] erohtar: i think im getting it now

[10:46am] erohtar: ok – final thought –

[10:46am] Chouser: (def q (map #(prn %)
[1 2 3])), (with-out-str (doall q))

[10:47am] Chouser: so yes, even though you created the lazy seq and closure outside the with-out-str, since its not computed or realized until the doall which is within the with-out-str, all those prn’s will use the binding provided by with-out-str

[10:47am] erohtar: got it…

[10:48am] erohtar: so my final thought – what if instead of *out* we’re talking about a connection to a database… bad use of vars, right?

[10:48am] Chouser: now lazy seqs can be a bit tricky because they cache their results and can effectively contain multiple closures.

[10:49am] Chouser: hmm….

[10:49am] erohtar: if i read from a db a list of objects using a var that holds database config…. and then i rebind the config to write into another database

[10:49am] Chouser: if you’re going to open a connection to a database, do a batch of work in a single thread on it, and then close the connection — then that would be a fine use.

[10:50am] erohtar: since the initial read may be lazy… when it realizes… it will ‘read from the wrong place'”?

[10:50am] Chouser: right

[10:50am] Chouser: basically you want to be very careful about passing closures (and therefore lazy seqs) past ‘binding’ boundaries.

[10:50am] erohtar: chouser: on that thought – what i did was, i read from a db, and to write into the other, i created agents, binding the new db config in each

[10:51am] erohtar: chouser: it didnt work…

[10:51am] erohtar: chouser: ok… im beginning to understand …

[10:51am] erohtar: i need to rethink what im doing

[10:52am] erohtar: and avoid using vars for some of this stuff, since im doing a lot of agent related stuff

[10:52am] Chouser: yes, I’d recommend trying to have the behavior of as much of your code as possible depend only on the args passed directly in.

[10:53am] erohtar: chouser: this whole mess started when i wanted to avoid passing db-config around everywhere

[10:53am] Chouser: this is better for testing, reasoning about behavior, easier to work with threads, closures, laziness, etc.

[10:53am] Chouser: heh, yeah.

[10:53am] erohtar: yup – i hear you

[10:53am] Chouser: well, another thing that may be helpful — just guessing since I haven’t seen your code…

[10:53am] erohtar: thanks a ton for your time – ur incredibly helpful

[10:54am] erohtar: ok?

[10:54am] Chouser: would be to try to keep as much of your code purely functional as possible.

[10:54am] erohtar: yes – i understand

[10:54am] Chouser: so avoid writing a function that takes some args, does some computation, and then writes to a db. Instead, have a fn that takes args, does computation, returns result.

[10:55am] Chouser: then perhaps you don’t have to pass in db config at all — whoever calls this fn can to the db write itself.

[10:55am] erohtar: i think it mostly is… except for this stuff… where i ended up inadvertently depending on “global” db-config

[10:55am] erohtar: well, this is a persistence layer

[10:56am] erohtar: i take an object in, break it up etc., and put it into hbase

[10:56am] erohtar: the code is open-sourece… would u like to see?

[10:56am] Chouser: hm, sure. I may need to ramp up on hbase soon.

[10:56am] erohtar: http://github.com/amitrathore/capjure/blob/b939a7038ebb4aed0d068d6e86291cc881ecf72b/src/org/rathore/amit/capjure.clj

[10:57am] erohtar: its kind a messy – learning clojure and hbase while doing this…

[10:57am] Chouser:

[10:57am] Chousuke: erohtar: first thought: use some newlines.

[10:58am] danlarkin: Noooooooooooo -jure

[10:58am] erohtar: danlarkin: haha – u’ve told me that already

[10:58am] Chouser: danlarkin: hey, at least he’s got a name

[10:58am] Chousuke: erohtar: one newline after each def/defn would make it a lot nicer

[10:58am] danlarkin: erohtar, Chouser:

[10:58am] erohtar: ok – more newlines, check

[10:59am] Chousuke: well, the def group is fine I guess.

[10:59am] Chousuke: but (defn foo) (defn bar) without an empty line in the middle is a bit weird

[11:00am] erohtar: ok – those were sort of just quick utility functions… but i’ll put newlines

[11:00am] Chousuke: newlines never hurt anyone

[11:00am] Chouser: erohtar: this is what you get for posting code. unsolicited advice.

[11:01am] erohtar: haha

[11:01am] erohtar: its fine – maybe i’ll learn something too

[11:01am] Chouser: but while we’re at it — I much prefer to have the code arranged so that ‘declare’ is only needed in mutually-recusive situations.

[11:01am] Chouser: though others may disagree with me

[11:02am] erohtar: right – i struggled with that one – cause i dont like to have to order the functions in “reverse”

[11:02am] • Chouser nods

[11:02am] erohtar: see – capjure-insert is the only one that most people will call

[11:03am] Chousuke: erohtar: you could add more newlines and some kind of “headlines” for sections of code

[11:03am] erohtar: u bind the config stuff – *hbase-master* and *primary-keys-config* – and then u call capjure-insert with the params

[11:03am] Chousuke: so people can easily skip the helper function blocks

[11:03am] erohtar: most of the remaining functions are functional

[11:03am] erohtar: only the ones dealing with hbase have (obvious) side-effects

[11:04am] erohtar: and it works fine… and its in production too… ran into issues trying to use TWO hbase configs (to move data between them)

[11:04am] Chousuke: that’s a lot of functions for one namespace too, though

[11:05am] Chouser: I really don’t think more namespaces would be better.

[11:05am] Chousuke: if possible maybe it’d make sense to put the “side-effecting” functions (that deal with hbase) in a separate file or namespace?

[11:05am] erohtar: cause lazy lists of data from one hbase, seem to be getting messed up when i rebind the config to start writing into the other hbase

[11:06am] erohtar: do u know what i mean? (trying to get discussion back to lazy closures and bindings )

[11:06am] Chousuke: ah

[11:06am] Chousuke: right, I can see why that happens

[11:06am] Chousuke: the laziness gets evaluated in the context of the new hbase.

[11:06am] erohtar: well, im only starting to see it thanks to u guys…

[11:06am] erohtar: yea

[11:06am] erohtar: drove me crazy

[11:06am] erohtar: and i did all this only so i wouldnt need to pass the db config around all the functions

[11:06am] erohtar: like u said, there are a lot of them…

[11:07am] Chousuke: which means you will have to move away from using global vars for config and instead create something you explicitly pass around to the functions.

[11:07am] erohtar: so now what… ? pass the config around everywhere?

[11:08am] Chousuke: maybe only to a few core functions.

[11:08am] erohtar: yea

[11:08am] Chousuke: which then would bind the dynamic vars.

[11:09am] erohtar: well, if i pass it in, i dont need dynamic vars, rite

[11:09am] Chouser: or don’t use lazy seqs

[11:09am] Chouser: use doall or stuff them in a vector first

[11:09am] erohtar: hmmm

[11:09am] cp2 is now known as c|p.

[11:09am] erohtar: that might be one solution

[11:09am] Chousuke: right, if they are side-effecting that might be better.

[11:09am] Chousuke: though maybe not memory-efficient

[11:10am] erohtar: yea

[11:10am] erohtar: i think i will get rid of the dynamic vars

[11:10am] Chousuke: it’s a shame though

[11:10am] erohtar: what is the pattern for this kind of ‘static’ config being passed around?

[11:10am] Chousuke: they make the code much easier to write

[11:10am] Chousuke: I wonder if there’s a better solution

[11:11am] Chousuke: some way to “bind” the current config to when the lazy-seq is first created

[11:11am] erohtar: in typical OO, u can create an instance variable to hold the config – i wonder if in clojure u can create something like that… and since its immutable, it should be fine, rite?

[11:11am] Chouser: yes

[11:11am] Chousuke: maybe do (let [config *global*] (return-some-lazy-seq-qith config))?

[11:11am] erohtar: but there is no way to do that, rite?

[11:11am] Chousuke: would that fix the value?

[11:12am] erohtar: yea

[11:12am] Chouser: stuff all your config into a map, and pass that single thing into functions that need to use it

[11:12am] erohtar: i thought of doing something like that… but other functions couldnt access those values…

[11:12am] erohtar: i thought of simulating an ‘immutable object oriented system’ –

[11:12am] brianh joined the chat room.

[11:12am] erohtar: like how u do in javascript… but immutable

[11:13am] Chousuke: I did in clojurebot so that I pass a map of config stuff around with every function

[11:13am] erohtar: return a closure with the vars bound with a let… and then closures for all the other functions also

[11:13am] Chousuke: it’s a bit tedious though

[11:13am] erohtar: i dunno if im being clear

[11:13am] erohtar: very tedious

[11:14am] erohtar: macros can help

[11:14am] erohtar: essentially what id be doing is creating an immutable OO system

[11:14am] erohtar: something like that

[11:14am] erohtar: does that make sense?

[11:14am] Chousuke: erohtar: but your problem is only with lazy seqs right?

[11:14am] erohtar: lazy seqs and dynamic vars

[11:14am] Chousuke: erohtar: don’t you only need to fixate the config when returning a seq

[11:14am] erohtar: the interplay

[11:14am] erohtar: yes

[11:15am] Chouser: erohtar: clojure.zip does something like that

[11:15am] erohtar: chouser: like what?

[11:15am] Chouser: each node is a object with a several functions attached as metadata

[11:16am] erohtar: chouser: ah, i see

[11:16am] erohtar: got it

[11:16am] Chouser: most of the api fns are like (children my-node), where children calls fns from my-node’s metadata, passing in my-node

[11:16am] erohtar: i see –

[11:17am] erohtar: thats rather cool – so u can basically have my-node equivalent contain the config data

[11:17am] Chouser: sure

[11:18am] erohtar: alright folks, thanks a lot for this conversation – im going to experiment and see how to get this done with the least amount of work

[11:18am] Chouser: the only reason zip nodes have fns in metadata is so it can be polymorphic.

[11:18am] erohtar: yes- makes sense

[11:18am] erohtar: (then everyone can tell me how my code sucks )

[11:18am] Chouser: if you’re going to only have one kind of db config (seems likely) then you don’t need the metadata piece.

[11:19am] erohtar: yes – just need to take care of the fact that the config will be rebound… so just closures that capture the config, and the functions that use it

[11:19am] erohtar: something like that

P.S. erohtar is my handle on IRC… it’s the reverse of my last name. In case you were wondering.

The first meetup of the Bay Area Clojure User Group

We held the first meeting of the BACUG (ugh, that’s a tedious acronym) last week, on Thursday the 5th of February. I was expecting about 6-8 people to show up, but we had about 12 people, quite a nice turn out!

After introductions and such, Chris Turner presented his unit-testing library for clojure called Clojure Spec. We talked about testing in general, and because he uses a lot of macro-writing macros in his code, we talked about macros in general and the difficulty in debugging macros. It was a good discussion – though I did throw in (tongue-in-cheek) that we’re using Test Is at Cinch. Thanks Chris, it was a good presentation.

After that talk, I gave an introduction to how we’re presently using Clojure in our startup. Since we’re using HBase as our persistent data-store, and processing that data using a bunch of clojure processes, we talked about data-modeling for HBase-like systems (I will write about my implementation of that data-model in another post).

An interesting thread came up – led by Joe Mikhail who works at Google and obviously does a lot of Map/Reduce/BigTable stuff – around how at least in the beginning, multi-threaded clojure processes can be used in place of Hadoop when processing HBase data. And he did mention that using systems like Terracotta, one can scale up such solutions. We’re going to look into that next week.

The remainder of the meeting went by in a buzz of talking about different languages, technologies, and general geek topics. The one thing of interest here was the point that software transactional memory is no panacea (surprise!) to the whole concurrency thing. Here’s an ACM article (there are several actually, just keep turning the pages) that throws some light on the issue. Here’s another.

Overall, it was a very good meeting – we’re hoping that the next one would be attended by more people, and especially some folks that have used other Lisps in the past – we’re all curious about what such folks think about Clojure. Indeed, and we want to learn from their experience in using idiomatic Lisp.

I’m thinking that we’ll do another Clojure meetup in about 6 weeks or so… join up, and stay tuned!

P.S. – A shout of thanks to my employer Runa. for hosting this meeting.

Startup School 2008

I attended this year’s Startup School – and all eight of the talks were really awesome. I got to hang out with a ton of people who had either already started their companies, or people that were looking to do so. The energy and the buzz was fantastic. Most importantly, however, I got to see three of my heroes –

paul_graham.jpg

Paul Graham – talking about the idea of being a benevolent startup.

jeff_bezos.jpg

Jeff Bezos – talking about Amazon Web Services as a way forward for startups.

peter_norvig.jpg

Peter Norvig – talking about extracting data from the web, and leveraging it in startups.

The Entrepreneurial Thought Leaders

I’m a book lover – I own nearly a thousand books now, and I even read many of them. I think that since there’s just so much to do and learn, and so little time, books are a fantastic way to know about things we might never get a chance to actually experience. Television can also be educational but I dislike it because it takes too much time to get through things – you can read at a much faster pace.

However, if there are times when you’re sitting in a train while commuting, or just driving to some place, podcasts can be absolutely fantastic. I love that the Economist has audio editions of their magazine (absolutely true to the printed form, and very high quality, btw). I haven’t missed an issue for nearly a whole year.

I want to share another great podcast resource – from The Entrepreneurial Thought Leaders seminar series at Stanford. I’m hugely thankful to my good friend Adrian Wible for telling me about this and his persistence in asking me to listen to them. Almost each and every one of them is like listening to a precis of some really important and interesting business book. Most are by very successful entrepreneurs, or venture capitalists, or professors from Stanford. Brilliant material.

They’re also available from iTunes; and I highly recommend them.