I spent a frustrating day figuring out why something wasn’t working as expected in a particular situation, even thought it was working just fine elsewhere. It turned out the problem was my flawed understanding of how Clojure’s dynamic vars behaved with respect to laziness. Thanks to Chouser and Chousuke on the #clojure channel in IRC, I think I have a slightly better grasp of the thing.
Update: Here’s a summary for those who don’t want to read the whole thing. What was happening is this – my assumption was that dynamic vars get captured in lazy objects – just like locals get captured in closures. They don’t – so if you later rebind the vars, when the lazy objects are realized, the vars will evaluate according to the latest binding. To baaad results 🙂
Here’s the chat transcript if you want to read the whole thing – it might help:
[10:24am] erohtar: chouser: u there?
[10:24am] Chouser: yep
[10:24am] erohtar: chouser: so i was thinking about our conversation yesterday about vars
[10:24am] erohtar: chouser: so vars in clojure arent the same as dynamically scoped variables in common lisp?
[10:25am] Chouser: I don’t know CL well enough to say for sure, but my understanding is that they’re pretty much equivalent.
[10:26am] Chouser: a possible difference being how threads interact with them — not sure what CL does there.
[10:26am] erohtar: chouser: the thing is… in common lisp, if i bing var-a to x, any thing down the call stack will see x…
[10:27am] erohtar: chouser: in clojure, it doesnt seem to behave like a closure… (laziness, during realization) is that correct? if var-a was bound to y elsewhere, realization of the old lazy stuff would now see y
[10:28am] erohtar: chouser: is that right?
[10:28am] Chouser: CL doesn’t do lazy seqs, right? so lets ignore that for now
[10:29am] erohtar: chouser: ok…
[10:29am] Chouser: let’s just look at how a closure works.
[10:30am] erohtar: chouser: i think i understand how a closure works, but when a closure is going to be evaluated lazily… what happens to a var that has since been rebound?
[10:31am] Chouser: (def var-a 2), (let [add (binding [var-a 5] (fn [n] (+ var-a n)))] (add 3))
[10:31am] Chouser: ok, here’s a little example that I would expect to work the same if translated to CL
[10:31am] Chouser: var-a has a root binding of 2.
[10:31am] erohtar: chouser: that should eval to 8, yes?
[10:31am] Chouser: no
[10:32am] erohtar: chouser: hmmm
[10:32am] Chouser: within the dynamic scope of the binding, a closure is created that refers to var-a
[10:32am] erohtar: chouser: ok… and that is 5?
[10:33am] Chousuke: nah, it’s var-a
[10:33am] erohtar: oh
[10:33am] Chouser: at the moment the closure is created, var-a has a thread-local binding of 5, but that doesn’t really matter because nothing is asking for var-a’s *value*, just grabbing var-a itself.
[10:33am] erohtar: i see… ok…
[10:34am] Chouser: so that closure is returned to outside the binding, and named ‘add’
[10:34am] Chousuke: I guess it’s a bit surprising though.
[10:34am] Chouser: and what does the closure have bundled inside? it knows it wants one arg n, and it has references to the vars var-a and +
[10:35am] erohtar: and it dynamically uses the value 2, for var-a when it needs it?
[10:35am] Chouser: right
[10:35am] erohtar: arent closures supposed to capture their environment?
[10:35am] Chouser: their lexical environment, yes.
[10:35am] erohtar: not the values?
[10:36am] Chouser: which is why this works: (def var-a 2), (let [m 7, add (binding [var-a 5] (fn  (+ var-a m)))] (add))
[10:37am] Chouser: hm, not that’s not quite right, since m is still available when ‘add’ is called. But anyway, it wouldn’t have to be.
[10:38am] erohtar: thinking about this
[10:38am] Chouser: if you think about how dynamic vars are used in clojure.core, you’ll see it must be this way.
[10:38am] Chouser: let’s look at *in*
[10:38am] Chouser: er, *out*
[10:38am] Chousuke: ,(let [x 1 add (fn  (+ 4 x))] [(binding [x 2] (add)) (binding [+ -] (add))])
[10:38am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context
[10:38am] Chousuke: hmm
[10:39am] erohtar: ok – talking about *out* – is that the std-out ?
[10:39am] Chouser: somewhere there’s something like (def *out* System/out)
[10:39am] Chouser: right
[10:39am] Chousuke: ,(let [x 1] (let [add (fn  (+ 4 x))] [(binding [x 2] (add)) (binding [+ -] (add))]))
[10:39am] erohtar: right
[10:39am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context
[10:39am] Chouser: that’s the root binding of *out*
[10:39am] erohtar: ok, with u so far
[10:39am] Chousuke: hmm, I guess binding let-bound variables does not work
[10:40am] Chouser: then you’ve got functions like prn that use *out*
[10:40am] erohtar: yes… i think they need to be vars
[10:40am] erohtar: right
[10:40am] kotarak: let-bound locals are not Vars
[10:40am] Chousuke: hmm
[10:40am] Chousuke: ,(let [x 1] #’x)
[10:40am] clojurebot: java.lang.Exception: Unable to resolve var: x in this context
[10:41am] Chouser: we can pretend prn is defined something like (defn prn [& args] (.write *out* args))
[10:41am] erohtar: when u create a lazy something, that uses prn, what happens?
[10:41am] Chousuke: indeed.
[10:41am] Chouser: now, if prn took the value of the *out* when it was defined, there would be no way to get prn to write to any other stream — it would always use the root binding, since that’s what was in play when prn was defined.
[10:42am] Chouser: but that’s not how it works.
[10:42am] Chouser: prn has a reference to the Var *out*, not its value.
[10:42am] erohtar: so something like – (map #(prn %)
[1 2 3])
[10:42am] erohtar: chouser: i completely agree
[10:42am] Chouser: so instead, prn will use the dynamic value of *out* when you call it.
[10:42am] erohtar: chouser: hmmm
[10:42am] Chouser: ,(with-out-str (prn 5 10))
[10:42am] clojurebot: “5 10n”
[10:43am] Chouser: got it?
[10:43am] erohtar: i do
[10:43am] erohtar: im thinking about my confusion – and trying to figure out the question to ask next
[10:44am] erohtar: so lets take my example –
[10:44am] erohtar: (map #(prn %)
[1 2 3])
[10:45am] erohtar: now, somewhere else, i rebind *out* to str… and in there, if i use the above object, will it also use the new binding?
[10:45am] erohtar: it will…
[10:45am] erohtar: i think im getting it now
[10:46am] erohtar: ok – final thought –
[10:46am] Chouser: (def q (map #(prn %)
[1 2 3])), (with-out-str (doall q))
[10:47am] Chouser: so yes, even though you created the lazy seq and closure outside the with-out-str, since its not computed or realized until the doall which is within the with-out-str, all those prn’s will use the binding provided by with-out-str
[10:47am] erohtar: got it…
[10:48am] erohtar: so my final thought – what if instead of *out* we’re talking about a connection to a database… bad use of vars, right?
[10:48am] Chouser: now lazy seqs can be a bit tricky because they cache their results and can effectively contain multiple closures.
[10:49am] Chouser: hmm….
[10:49am] erohtar: if i read from a db a list of objects using a var that holds database config…. and then i rebind the config to write into another database
[10:49am] Chouser: if you’re going to open a connection to a database, do a batch of work in a single thread on it, and then close the connection — then that would be a fine use.
[10:50am] erohtar: since the initial read may be lazy… when it realizes… it will ‘read from the wrong place'”?
[10:50am] Chouser: right
[10:50am] Chouser: basically you want to be very careful about passing closures (and therefore lazy seqs) past ‘binding’ boundaries.
[10:50am] erohtar: chouser: on that thought – what i did was, i read from a db, and to write into the other, i created agents, binding the new db config in each
[10:51am] erohtar: chouser: it didnt work…
[10:51am] erohtar: chouser: ok… im beginning to understand …
[10:51am] erohtar: i need to rethink what im doing
[10:52am] erohtar: and avoid using vars for some of this stuff, since im doing a lot of agent related stuff
[10:52am] Chouser: yes, I’d recommend trying to have the behavior of as much of your code as possible depend only on the args passed directly in.
[10:53am] erohtar: chouser: this whole mess started when i wanted to avoid passing db-config around everywhere
[10:53am] Chouser: this is better for testing, reasoning about behavior, easier to work with threads, closures, laziness, etc.
[10:53am] Chouser: heh, yeah.
[10:53am] erohtar: yup – i hear you
[10:53am] Chouser: well, another thing that may be helpful — just guessing since I haven’t seen your code…
[10:53am] erohtar: thanks a ton for your time – ur incredibly helpful
[10:54am] erohtar: ok?
[10:54am] Chouser: would be to try to keep as much of your code purely functional as possible.
[10:54am] erohtar: yes – i understand
[10:54am] Chouser: so avoid writing a function that takes some args, does some computation, and then writes to a db. Instead, have a fn that takes args, does computation, returns result.
[10:55am] Chouser: then perhaps you don’t have to pass in db config at all — whoever calls this fn can to the db write itself.
[10:55am] erohtar: i think it mostly is… except for this stuff… where i ended up inadvertently depending on “global” db-config
[10:55am] erohtar: well, this is a persistence layer
[10:56am] erohtar: i take an object in, break it up etc., and put it into hbase
[10:56am] erohtar: the code is open-sourece… would u like to see?
[10:56am] Chouser: hm, sure. I may need to ramp up on hbase soon.
[10:56am] erohtar: http://github.com/amitrathore/capjure/blob/b939a7038ebb4aed0d068d6e86291cc881ecf72b/src/org/rathore/amit/capjure.clj
[10:57am] erohtar: its kind a messy – learning clojure and hbase while doing this…
[10:57am] Chousuke: erohtar: first thought: use some newlines.
[10:58am] danlarkin: Noooooooooooo -jure
[10:58am] erohtar: danlarkin: haha – u’ve told me that already
[10:58am] Chouser: danlarkin: hey, at least he’s got a name
[10:58am] Chousuke: erohtar: one newline after each def/defn would make it a lot nicer
[10:58am] danlarkin: erohtar, Chouser:
[10:58am] erohtar: ok – more newlines, check
[10:59am] Chousuke: well, the def group is fine I guess.
[10:59am] Chousuke: but (defn foo) (defn bar) without an empty line in the middle is a bit weird
[11:00am] erohtar: ok – those were sort of just quick utility functions… but i’ll put newlines
[11:00am] Chousuke: newlines never hurt anyone
[11:00am] Chouser: erohtar: this is what you get for posting code. unsolicited advice.
[11:01am] erohtar: haha
[11:01am] erohtar: its fine – maybe i’ll learn something too
[11:01am] Chouser: but while we’re at it — I much prefer to have the code arranged so that ‘declare’ is only needed in mutually-recusive situations.
[11:01am] Chouser: though others may disagree with me
[11:02am] erohtar: right – i struggled with that one – cause i dont like to have to order the functions in “reverse”
[11:02am] • Chouser nods
[11:02am] erohtar: see – capjure-insert is the only one that most people will call
[11:03am] Chousuke: erohtar: you could add more newlines and some kind of “headlines” for sections of code
[11:03am] erohtar: u bind the config stuff – *hbase-master* and *primary-keys-config* – and then u call capjure-insert with the params
[11:03am] Chousuke: so people can easily skip the helper function blocks
[11:03am] erohtar: most of the remaining functions are functional
[11:03am] erohtar: only the ones dealing with hbase have (obvious) side-effects
[11:04am] erohtar: and it works fine… and its in production too… ran into issues trying to use TWO hbase configs (to move data between them)
[11:04am] Chousuke: that’s a lot of functions for one namespace too, though
[11:05am] Chouser: I really don’t think more namespaces would be better.
[11:05am] Chousuke: if possible maybe it’d make sense to put the “side-effecting” functions (that deal with hbase) in a separate file or namespace?
[11:05am] erohtar: cause lazy lists of data from one hbase, seem to be getting messed up when i rebind the config to start writing into the other hbase
[11:06am] erohtar: do u know what i mean? (trying to get discussion back to lazy closures and bindings )
[11:06am] Chousuke: ah
[11:06am] Chousuke: right, I can see why that happens
[11:06am] Chousuke: the laziness gets evaluated in the context of the new hbase.
[11:06am] erohtar: well, im only starting to see it thanks to u guys…
[11:06am] erohtar: yea
[11:06am] erohtar: drove me crazy
[11:06am] erohtar: and i did all this only so i wouldnt need to pass the db config around all the functions
[11:06am] erohtar: like u said, there are a lot of them…
[11:07am] Chousuke: which means you will have to move away from using global vars for config and instead create something you explicitly pass around to the functions.
[11:07am] erohtar: so now what… ? pass the config around everywhere?
[11:08am] Chousuke: maybe only to a few core functions.
[11:08am] erohtar: yea
[11:08am] Chousuke: which then would bind the dynamic vars.
[11:09am] erohtar: well, if i pass it in, i dont need dynamic vars, rite
[11:09am] Chouser: or don’t use lazy seqs
[11:09am] Chouser: use doall or stuff them in a vector first
[11:09am] erohtar: hmmm
[11:09am] cp2 is now known as c|p.
[11:09am] erohtar: that might be one solution
[11:09am] Chousuke: right, if they are side-effecting that might be better.
[11:09am] Chousuke: though maybe not memory-efficient
[11:10am] erohtar: yea
[11:10am] erohtar: i think i will get rid of the dynamic vars
[11:10am] Chousuke: it’s a shame though
[11:10am] erohtar: what is the pattern for this kind of ‘static’ config being passed around?
[11:10am] Chousuke: they make the code much easier to write
[11:10am] Chousuke: I wonder if there’s a better solution
[11:11am] Chousuke: some way to “bind” the current config to when the lazy-seq is first created
[11:11am] erohtar: in typical OO, u can create an instance variable to hold the config – i wonder if in clojure u can create something like that… and since its immutable, it should be fine, rite?
[11:11am] Chouser: yes
[11:11am] Chousuke: maybe do (let [config *global*] (return-some-lazy-seq-qith config))?
[11:11am] erohtar: but there is no way to do that, rite?
[11:11am] Chousuke: would that fix the value?
[11:12am] erohtar: yea
[11:12am] Chouser: stuff all your config into a map, and pass that single thing into functions that need to use it
[11:12am] erohtar: i thought of doing something like that… but other functions couldnt access those values…
[11:12am] erohtar: i thought of simulating an ‘immutable object oriented system’ –
[11:12am] brianh joined the chat room.
[11:13am] Chousuke: I did in clojurebot so that I pass a map of config stuff around with every function
[11:13am] erohtar: return a closure with the vars bound with a let… and then closures for all the other functions also
[11:13am] Chousuke: it’s a bit tedious though
[11:13am] erohtar: i dunno if im being clear
[11:13am] erohtar: very tedious
[11:14am] erohtar: macros can help
[11:14am] erohtar: essentially what id be doing is creating an immutable OO system
[11:14am] erohtar: something like that
[11:14am] erohtar: does that make sense?
[11:14am] Chousuke: erohtar: but your problem is only with lazy seqs right?
[11:14am] erohtar: lazy seqs and dynamic vars
[11:14am] Chousuke: erohtar: don’t you only need to fixate the config when returning a seq
[11:14am] erohtar: the interplay
[11:14am] erohtar: yes
[11:15am] Chouser: erohtar: clojure.zip does something like that
[11:15am] erohtar: chouser: like what?
[11:15am] Chouser: each node is a object with a several functions attached as metadata
[11:16am] erohtar: chouser: ah, i see
[11:16am] erohtar: got it
[11:16am] Chouser: most of the api fns are like (children my-node), where children calls fns from my-node’s metadata, passing in my-node
[11:16am] erohtar: i see –
[11:17am] erohtar: thats rather cool – so u can basically have my-node equivalent contain the config data
[11:17am] Chouser: sure
[11:18am] erohtar: alright folks, thanks a lot for this conversation – im going to experiment and see how to get this done with the least amount of work
[11:18am] Chouser: the only reason zip nodes have fns in metadata is so it can be polymorphic.
[11:18am] erohtar: yes- makes sense
[11:18am] erohtar: (then everyone can tell me how my code sucks )
[11:18am] Chouser: if you’re going to only have one kind of db config (seems likely) then you don’t need the metadata piece.
[11:19am] erohtar: yes – just need to take care of the fact that the config will be rebound… so just closures that capture the config, and the functions that use it
[11:19am] erohtar: something like that
P.S. erohtar is my handle on IRC… it’s the reverse of my last name. In case you were wondering.