Announcing a new blog: blog.zolodeck.com. Just wrote the first post on my work with Datomic. I’ve put some of it into a project called demonic, and hopefully, you’ll find it of some use!
demonic v0.1 – utilities for Datomic
Posted by Amit Rathore on April 18, 2012
Posted in Uncategorized | Tagged: clojure, code, database, datomic, zolodeck | Leave a Comment »
calling recur from catch or finally
Posted by Amit Rathore on August 15, 2010
Clojure doesn’t have tail recursion, but does support the recur form. Let’s take a quick look at how it’s used. Consider a function that sums up a list of numbers to an accumulator:
(defn add-numbers [acc numbers]
(if (empty? numbers)
acc
(add-numbers (+ acc (first numbers)) (rest numbers))))
Lets ignore all the ways this can be done without the silly implementation above. Here it is in action:
user> (add-numbers 10 (range 10)) 55
And here’s the problem with it:
user> (add-numbers 10 (range 10000)) ; Evaluation aborted. No message. [Thrown class java.lang.StackOverflowError]
The reason, of course, is that being a self-recursive function that calls itself explicitly, it blows the stack. Clojure has a way to get around this, via the recur form:
(defn add-numbers [acc numbers]
(if (empty? numbers)
acc
(recur (+ acc (first numbers)) (rest numbers))))
And here is proof that it works:
user> (add-numbers 10 (range 10000)) 49995010
Now, let’s look at a case where one might want to recurse from inside a catch or finally block. A use-case is a function like connect-to-service, that must retry the connection if the service is unavailable. An easy way to implement it is to catch the exception thrown when the attempt at connecting fails, then wait a few seconds, and try again by recursing. Here’s a contrived example of a function that recurs from catch:
(defn catch-recurse [n i]
(try
(if (> n i)
(/ i 0)
n)
(catch Exception e
(recur n (inc i)))))
The problem, of course, is that Clojure complains:
Cannot recur from catch/finally [Thrown class java.lang.UnsupportedOperationException]
So what to do? One way is to make the call explicitly, and hope that it won’t blow the stack:
(defn catch-recurse [n i]
(try
(if (> n i)
(/ i 0)
n)
(catch Exception e
(catch-recurse n (inc i)))))
It could blow the stack, though, depending:
user> (catch-recurse 100 1) 100 user> (catch-recurse 10000 1) ; Evaluation aborted. No message. [Thrown class java.lang.StackOverflowError]
As pointed out, this may blow the stack, but it may not, depending on your situation. If you know it won’t, then this may be OK. Here’s a way to avoid this situation completely, using trampoline. First, a minor change to catch-recurse:
(defn catch-recurse [n i]
(try
(if (> n i)
(/ i 0)
n)
(catch Exception e
#(catch-recurse n (inc i)))))
Notice that in the case of an exception, we return a thunk. Now, to use our new function:
user> (trampoline catch-recurse 100 1) 100 user> (trampoline catch-recurse 10000 1) 10000
And there you have it. The common use-case of trampoline is to handle mutually recursive functions where recur isn’t useful. It checks to see if the return value of the function it’s passed in is another function. If so, it calls it. It repeats the process until a non-function value is returned, which it then itself returns. Very useful!
Posted in Uncategorized | Tagged: clojure, code, lisp, reliability | Leave a Comment »
Announcing WorkAtRuna.com
Posted by Amit Rathore on July 24, 2010
We’ve put a bunch of content together about our work here at Runa, and about what it’s like to work here. If you’re a Clojure developer, or a DevOps extraordinaire, drop us a line!
“In 1995, Paul Graham and Robert Morris used Common Lisp to help online merchants. Now, 15 years later, we’re doing the same with Clojure.”
Read more at www.workatruna.com
Posted in Uncategorized | Tagged: clojure, dev-ops, jobs, lisp, runa | Leave a Comment »
Medusa 0.1 – a supervised thread-pool for Clojure futures
Posted by Amit Rathore on June 8, 2010
Clojure comes with two kinds of thread-pools – a bounded thread-pool for CPU-bound operations, and one for IO-bound operations that grows as needed. The bounded thread-pool is used every time an action is sent to an agent via the send function. The unbounded thread-pool is used (for instance) every time an action is sent to an agent using the send-off function. Futures also run on this unbounded thread-pool.
Sometimes, however, you might need a third option. This is the case where you don’t want an unbounded pool of threads that grows so much that the system runs out of resources trying to juggle the sheer number of threads. This might happen (say) if you were using send-off to handle incoming requests for IO-bound operations. Under normal circumstances, such a system might perform in an acceptable manner. If the request load were to spike, however, you could quickly create a larger-than-manageable number of threads.
What you need in such a case is a separate thread-pool for IO operations – one that has more threads than the one in the thread-pool for CPU-bound operations, but still bound so that it only grows to a certain size, and then any further requests get queued. Luckily, Clojure allows you to seamlessly use underlying Java libraries.
Medusa is a bounded, supervised thread-pool. A supervisor function runs alongside the thread-pool and it monitors the running tasks. If they take more than a specified amount of time, they are evicted. If the thread-pool is fully occupied, Medusa will queue all further tasks submitted and will run each task as soon as a thread becomes available. The Medusa thread-pool size is thrice the number of cores available to the JVM. In future versions, this number will be configurable.
Here it is in action -
(use 'org.rathore.amit.medusa.core)
(start-supervisor)
(defn new-task [id sleep-seconds]
(println (System/currentTimeMillis) "| Starting task" id "will sleep for" sleep-seconds)
(Thread/sleep (* 1000 sleep-seconds))
(println (System/currentTimeMillis) "| Done task" id))
(defn run-tasks [n]
(println "Will submit" n "jobs")
(dotimes [i n]
(medusa-future i #(new-task i (* 5 (inc i))))))
(run-tasks 20)
The output is -
Will submit 20 jobs 1276068494442 | Starting task 0 will sleep for 5 1276068494448 | Starting task 1 will sleep for 10 1276068494449 | Starting task 2 will sleep for 15 1276068494449 | Starting task 3 will sleep for 20 1276068494451 | Starting task 4 will sleep for 25 1276068494451 | Starting task 5 will sleep for 30 1276068499447 | Done task 0 1276068499448 | Starting task 6 will sleep for 35 1276068504448 | Done task 1 1276068504448 | Starting task 7 will sleep for 40 1276068509448 | Done task 2 1276068509448 | Starting task 8 will sleep for 45 1276068514448 | Done task 3 1276068514449 | Starting task 9 will sleep for 50 1276068519450 | Starting task 10 will sleep for 55 1276068523547 | Starting task 11 will sleep for 60 1276068523548 | Starting task 13 will sleep for 70 1276068523547 | Starting task 12 will sleep for 65 1276068523548 | Starting task 14 will sleep for 75 1276068523548 | Starting task 15 will sleep for 80 1276068523549 | Starting task 16 will sleep for 85 1276068533547 | Starting task 17 will sleep for 90 1276068533547 | Starting task 18 will sleep for 95 1276068543547 | Starting task 19 will sleep for 100
Notice that the first few tasks complete, since the pre-emption time is 20 seconds. The rest of the tasks get pre-empted out of the thread-pool by the supervisor since they take too long (simulated above by the sleeps). Since all the later tasks have been coded to take more than 20 seconds, they will all get pre-empted. The Medusa thread-pool is then ready for more tasks. This pre-emption is what allows the other tasks to start, as can be seen by looking at the timestamps of the log messages. This fulfills the requirement that we have a bounded-threadpool with supervised pre-emption of tasks that take too long.
Here’s the thread-usage when the program starts, and the supervisor has started:

Here’s the thread-usage when the tasks complete:

The semantics are still not of the standard Clojure futures – currently, Medusa “futures” only handle side-effects. A next step would be to give them the same future semantics so that they return the result of their computation – that will come in the next version.
The project is hosted on github, as usual – http://github.com/amitrathore/medusa. Click here to see the basic implementation.
Posted in Uncategorized | Tagged: clojure, code, distributed-computing, lisp, mutli-core, parallel | 2 Comments »
conjure – simple mocking and stubbing for Clojure unit-tests
Posted by Amit Rathore on January 24, 2010
Siva and I were pairing on a unit-test that involved writing something to HBase. When Siva said that mocking the call to the save-to-hbase function would make testing easier (a simple thing using JMock, he said), I decided to write a quick mocking utility for Clojure.
Then later, we realized that we wanted to go one step further. The row-id that was used as the key to the object in HBase was generated using system-time. That meant that even if we wanted to confirm that the object was indeed saved, we had no way of knowing what the row-id was. One solution to such a problem is to inject the row-id in (instead of being tightly coupled to the function that generated the row-id). Instead, I wrote a stubbing utility that makes this arbitrarily easy to do.
So here they are – mocking and stubbing – packaged up as the conjure project on github.
The set up
Imagine we had the following functions -
(defn xx [a b] 10) (defn yy [z] 20) (defn fn-under-test [] (xx 1 2) (yy "blah")) (defn another-fn-under-test [] (+ (xx nil nil) (yy nil)))
Also imagine that we had to test fn-under-test and another-fn-under-test, and we didn’t want to have to deal with the xx or yy functions. Maybe they’re horrible functions that open connections to computers running Windoze or something, I dunno.
Mocking
Here’s how we might mock them out -
(deftest test-basic-mocking
(mocking [xx yy]
(fn-under-test))
(verify-call-times-for xx 1)
(verify-call-times-for yy 1)
(verify-first-call-args-for xx 1 2)
(verify-first-call-args-for yy "blah"))
Pretty straightforward, eh? You just use the mocking macro, specifying all the functions that need to be mocked out. Then, within the scope of mocking, you call your functions that need to be tested. The calls to the specified functions will get mocked out (they won’t occur), and you can then use things like verify-call-times-for and verify-first-call-args-for to ensure things worked as expected.
Stubbing
As mentioned in the intro to this post, sometimes your tests need to specify values to be returned by the functions being mocked out. That’s where stubbing comes in. Here’s how it works -
(deftest test-basic-stubbing
(is (= (another-fn-under-test) 30))
(stubbing [xx 1 yy 2]
(is (= (another-fn-under-test) 3))))
So that’s it! Pretty simple. Note how within the scope of stubbing, xx returns 1 and yy returns 2. Now, for the implementation.
Implementation
The code is almost embarrassingly straight-forward. Take a look -
(ns org.rathore.amit.conjure.core
(:use clojure.test))
(def call-times (atom {}))
(defn stub-fn [function-name return-value]
(swap! call-times assoc function-name [])
(fn [& args]
(swap! call-times update-in [function-name] conj args)
return-value))
(defn mock-fn [function-name]
(stub-fn function-name nil))
(defn verify-call-times-for [fn-name number]
(is (= number (count (@call-times fn-name)))))
(defn verify-first-call-args-for [fn-name & args]
(is (= args (first (@call-times fn-name)))))
(defn verify-nth-call-args-for [n fn-name & args]
(is (= args (nth (@call-times fn-name) (dec n)))))
(defn clear-calls []
(reset! call-times {}))
(defmacro mocking [fn-names & body]
(let [mocks (map #(list 'mock-fn %) fn-names)]
`(binding [~@(interleave fn-names mocks)]
~@body)))
(defmacro stubbing [stub-forms & body]
(let [stub-pairs (partition 2 stub-forms)
fn-names (map first stub-pairs)
stubs (map #(list 'stub-fn (first %) (last %)) stub-pairs)]
`(binding [~@(interleave fn-names stubs)]
~@body)))
It’s just an hour or so of work, so it’s probably rough, and certainly doesn’t support more complex features of other mocking/stubbing libraries. But I thought the simplicity was enjoyable.
Posted in Uncategorized | Tagged: clojure, code, DSL, lisp, TDD, testing | 6 Comments »
Runa is hiring developers
Posted by Amit Rathore on August 27, 2009
We, Runa (runa.com), are looking for great developers to join our small team. We’re an early stage, pre-series-A startup (presently funded with strategic investments from two large corporations) playing in the e-commerce space. We’re creating a new product in the small-to-medium online-retailing segment, and if we’re successful, it will be a very large disruption.
Techie keywords: clojure, hadoop, hbase, rabbitmq, erlang, ruby, rails, javascript, amazon EC2, unit-testing, functional-testing, selenium, agile, lean, XP
If you’re interested, email me at amit@runa.com
If you want to know more, read on!
What we do
Runa aims to provide small-to-not-so-large online retailers with tools/services that companies like amazon.com use/provide. These smaller guys can’t afford to do anything on that scale, but by using our SaaS services, they can make more money while providing customers with greater value.
The first service we’re building is what we call Dynamic Sale Price.
It’s a simple concept – it allows the online-retailer to offer a sale price for each product on his site, personalized to the individual consumer who is browsing it. By using this service, merchants are able to -
- increase conversion (get them to buy!) and
- offer consumers a special price which maximizes the merchant’s profit
This is different from “dumb-discounting” where something is marked-down, and everyone sees the same price. This service is more like airline or hotel pricing which varies from day to day, but much more dynamic and real-time. Further, it is based on broad statistical factors AND individual consumer behavior. After all, if you lower prices enough, consumers will buy. Instead, we dynamically lower prices to a point where statistically, that consumer is most likely to buy.
How we do it
Runa does this by performing statistical analysis and pattern recognition of what consumers are doing on the merchant sites. This includes browsing products on various pages, adding and removing items from carts, and purchasing or abandoning the carts. We track consumers as they browse, and collect vast quantities of this click-stream data. By mining this data and applying algorithms to determine a price point per consumer based on their behavior, we’re able to maximize both conversion (getting the consumer to buy) AND merchant profit.
We also offer the merchant comprehensive reports based on analysis of the mountains of data we collect. Since the data tracks consumer activity down to the individual product SKU level (for each individual consumer), we can provide very rich analytics. This is a tool that merchants need today, but don’t have the resources to build for themselves.
The business model
For reference, it is useful to understand the affiliate marketing space. Small-to-medium merchants (our target audience) pay affiliates up to 40% of a sale price. Yes, 40%. The average is in the 20% range.
We charge our merchants around 10% of sales the Runa delivers. Our merchants are happy to pay it, because it is a performance-based pay, lower than what they pay affiliates, and there is zero up-front cost to the service. In fact, the above mentioned analytics reports are free.
We’re targeting e-commerce PLATFORMS (as opposed to individual merchants); in this way, we’re able to scale up merchant-acquisition. We have 10 early-customer merchants right now, with about 100 more planned to go live in the next 2-3 months. By the end of next year, we’re targeting about 1,000 merchants and 10,000 merchants the following year. Our channel deployment model makes these goals achievable.
At something like a 5% to 10% service charge, and a typical merchant having between 500K to 1M in sales per year, this is a VERY profitable business model. That is, of course, if we’re successful… but we’re seeing very positive signs so far.
And we haven’t even talked about all the other things on our product-roadmap!
Technology
Most of our front-end stuff (like the merchant-dashboard, reports, campaign management) is built with Ruby on Rails. Our merchant integration requires browser-side Javascript magic. All our analytics (batch-processing) and real-time pricing services are written in Clojure. We use RabbitMQ for all our messaging needs. We store data in HBase. We’re deployed on Amazon’s EC2.
We need to be extremely scalable and fast. This is for two reasons -
- The prices are offered to consumers in real-time, as they browse. There can be no delay.
- The load on our service grows quickly – each time we sign on a merchant, we’re hit with the traffic from all their customers
It is a very challenging problem, and a lot of fun to solve.
Here are a few blog postings about what we’ve been up to -
- http://s-expressions.com/2009/05/02/startup-logbook-distributed-clojure-system-in-production-v02/
- http://s-expressions.com/2009/04/12/using-messaging-for-scalability/
- http://s-expressions.com/2009/03/31/capjure-a-simple-hbase-persistence-layer/
- http://s-expressions.com/2009/01/28/startup-logbook-clojure-in-production-release-v01/
We’ve also open-sourced a few of our projects -
- swarmiji – A distributed computing system to write and run Clojure code in parallel, across CPUs
- capjure – Clojure persistence for HBase
Culture at Runa
We’re a small team, very passionate about what we do. We’re focused on delivering a ground-breaking, disruptive service that will allow merchants to really change the way they sell online. We work start-up hours, but we’re flexible and laid-back about it. We know that a healthy personal life is important for a good professional life. We work with each other to support it.
We use an agile process with a lot of influences from the Lean and Kanban world. We use Mingle to run our development process. Everything, OK mostly everything
is covered by automated tests, so we can change things as needed.
We’re all Apple in the office – developers get a MacPro with a nice 30” screen, and a nice 17” MacBook Pro. We deploy on Ubuntu servers. Aeron chairs are cliché, yes; but, very comfy.
The environment is chilled out… you can wear shorts and sandals to work… Very flat organization, very non-bureaucratic… nice open spaces (no cubes!). Lunch is brought in on most days! Beer and snacks are always in the fridge.
We’re walking distance to the San Antonio Caltrain station (biking distance from the Mountain View Caltrain/VTA lightrail station).
What’s in it for you
- Competitive salaries, and lots of stock-options
- Cutting edge technology stack
- Fantastic business opportunity, and early-stage (= great time to join!)
- Developer #5 – means plenty of influence on foundational architecture and design
- Smart, fun people to work with
- Very comfortable, nice office environment
OK!
So, if you’re interested, email me at amit@runa.com
Posted in Uncategorized | Tagged: clojure, jobs, rails, ruby, startup, webscale | Leave a Comment »
Clojure, the REPL and test-driven development
Posted by Amit Rathore on July 28, 2009
I’ve been using Clojure for nearly a year now, and something strange has been happening… I still think unit-tesitng is extremely important, but for some reason I don’t seem to be writing the same number of tests any more. I’m ashamed to say it, but there it is. And it gets stranger – this new lower test count doesn’t seem to matter.
It seems to me that my Clojure code works right the first time more often than my Ruby or Java code ever did. And I seem to find less defects in the Clojure code over time, too.
This is not just a fanboy speaking, though I am a huge fan of Clojure. I think that the reasons I’m observing this is due to a an important characteristic of the language. Instead of just talking about it, let me first walk you through an example.
This is something I had to do recently – we wanted to build a kind of reverse index for an HBase table. The row ids of this table are time-stamps. The idea was that this “reverse index” would allow us to answer the question of what the first time-stamp for a given day was. In other words, we needed to convert a list of time-stamps into a lookup of day vs. the first time-stamp of that day. Eg.
Input:
[“112323123” “1231231231” “123123123” “ 1231231123” ....]
Output:
{“2009-07-01” “123123123”
“2009-07-02” “123131213”
“2009-07-03” “123123122”}
(Note: I plucked the numbers out of the air, they aren’t accurate. But the idea is that the input is a long stream of timestamps, and possibly hundreds could correspond to each day.)
So I get started… thinking to myself – I know how to convert a timestamp to a day. From there, it’s easy to write a function that returns a hash containing the day vs. timestamp (Since I already had a function day-for-timestamp, it was easy) -
(defn day-vs-timestamp [time-stamp]
{(day-for-time-stamp time-stamp) time-stamp})
So now, all I have to do is map the above function across the input. This gives me a list of hash-maps, each with one key-value pair. To ensure that I’m doing this in order of oldest first, I sort the input as well. Inside of a let form, all of this looks like –
(let [all-pairs (map day-vs-timestamp (sort input-list))]
Now, I have this list of hashes, each with one key (the day) and one corresponding value (the time-stamp itself). I want to combine these into one single hash-map which would be the final answer. But I have to deal with the issue of duplicate keys – when I find a duplicate key, I want to keep the first value associated with the key since it would be the oldest.
Clojure has a merge-with function which does just this – it accepts a function with 2 arguments (which are the two values in case a duplicate key is found) and the returning value is used in the merged hash-map.
(apply merge-with #(first [%1 %2]) all-pairs)
That’s basically it.
Combining everything -
(defn day-vs-timestamp [time-stamp]
{(day-for-time-stamp time-stamp) time-stamp})
(defn lookup-table [input-timestamps]
(let [all-pairs (map day-vs-timestamp (sort input-list))]
(apply merge-with #(first [%1 %2]) all-pairs)))
When I write code like this – I often ask myself, what exactly should I test? I end up writing a few happy path tests that prove my code works. And then a couple of tests that test border cases and negative paths. And I sometimes do it test first.
But the REPL has spoilt me. What I used TDD for when coding with Ruby (and still do), I often do at the REPL. I build tiny functions that work – these are often single lines of code. Then I combine these into other functions, often no more than two lines of code each, sometimes three. And it all just works – leaving me wondering what to cover with tests.
The main reason I still write tests is for regression – if something breaks in the future, I catch it quickly. However, the other thing – the test *driven* design aspect of TDD – has been somewhat replaced by the REPL. And its very much more dynamic than a set of static tests. It really brings out the rapid, in rapid application development – especially when combined with Emacs and SLIME.
One main difference with Clojure vs. Ruby (say) is that Clojure is functional (I use very little of Clojure’s constructs for state). And in the functional world, I just don’t have to worry about state (obviously), and this tremendously simplifies code. I think in terms of map, filter, reduce, some, every, merge, etc. and the actual logic is in tiny functions used from within these other higher level constructs. The idea of first-class functions is also key – I can build up the business logic by writing small functions that do a tiny thing each – and combine them using higher-order functions.
This is one reason why we’re so productive with Clojure. We’ve moved to Clojure for 90% of our work. That said, we still use Ruby for parts of our code-base, and it’s still my favorite imperative language
Posted in Uncategorized | Tagged: clojure, code, functional, lisp, TDD, unit-testing | 10 Comments »
Capjure: a simple HBase persistence layer – updated with documentation
Posted by Amit Rathore on May 25, 2009
This is just to let folks know I’ve (finally) written up some documentation on how to use capjure. Hope it helps!
Posted in Uncategorized | Tagged: clojure, code, database, lisp, webscale | 1 Comment »
Startup logbook – v0.2 – distributed Clojure system in production
Posted by Amit Rathore on May 2, 2009
This past weekend, we pushed another major release into production. We’ve been working on several things and have made a few pushes since the last time I wrote about this – but this release has a bunch of interesting Clojure related stuff.
Long-running processes
The main thing of note is that the majority of our back-end is now written in Clojure. You might recall that our online-merchant customers send us a lot of data, and we run a ton of analytics on that data. Our initial plans involved Ruby, but as we started using Clojure, it turned out that it is very well suited for this job as well (long running, message-driven processes that crunch numbers).
The raw data sits in HBase, and every night a “master” process starts up which kicks-off the processing of the previous day’s worth of data. The job of this master is only to coordinate the work (it doesn’t actually do any real work), it does this by breaking work into chunks and dispatching messages that each assign work to any worker process that picks it up. The master is single threaded for simplicity, but failure tolerant – it checkpoints everything in a local MySQL database, and if it crashes, it is automatically re-spawned and it recovers from where it left off.

An elastic cloud of worker processes run in anticipation of the master handing out this work. The worker processes use the MySQL database to keep track of their progress as well. The rest is rather domain-specific. We use intermediate representations of the raw data, which is also stored in HBase, before finally storing the summarized version again in HBase.
Swarmiji
We use an in-house distributed-programming framework called Swarmiji to make such distributed programs very easy to write and run. Swarmiji implements a flavor of staged event-driven architecture (SEDA) to allow server processes that exhibit scalable, predictable throughput. This is especially true in the face of over-load, which we can certainly expect in our environment.
The reason I wrote this framework was that I wanted to create distributed, parallel programs which exploited large numbers of machines (like in a data-center) – without being limited by clojure’s in-JVM-threads-based model. So each worker process in Swarmiji gets deployed as a shared-nothing JVM process.
I will write up a post introducing Swarmiji in the next few weeks – once its a bit more battle-tested, and I’ve added a few more features (mainly around process management).
Posted in Uncategorized | Tagged: clojure, code, distributed-systems, lisp, startup, webscale | 3 Comments »
