Startup logbook – v0.2 – distributed Clojure system in production

This past weekend, we pushed another major release into production. We’ve been working on several things and have made a few pushes since the last time I wrote about this – but this release has a bunch of interesting Clojure related stuff.

Long-running processes

The main thing of note is that the majority of our back-end is now written in Clojure. You might recall that our online-merchant customers send us a lot of data, and we run a ton of analytics on that data. Our initial plans involved Ruby, but as we started using Clojure, it turned out that it is very well suited for this job as well (long running, message-driven processes that crunch numbers).

The raw data sits in HBase, and every night a “master” process starts up which kicks-off the processing of the previous day’s worth of data. The job of this master is only to coordinate the work (it doesn’t actually do any real work), it does this by breaking work into chunks and dispatching messages that each assign work to any worker process that picks it up. The master is single threaded for simplicity, but failure tolerant – it checkpoints everything in a local MySQL database, and if it crashes, it is automatically re-spawned and it recovers from where it left off.

clojure-in-production-v0.2.png

An elastic cloud of worker processes run in anticipation of the master handing out this work. The worker processes use the MySQL database to keep track of their progress as well. The rest is rather domain-specific. We use intermediate representations of the raw data, which is also stored in HBase, before finally storing the summarized version again in HBase.

Swarmiji

We use an in-house distributed-programming framework called Swarmiji to make such distributed programs very easy to write and run. Swarmiji implements a flavor of staged event-driven architecture (SEDA) to allow server processes that exhibit scalable, predictable throughput. This is especially true in the face of over-load, which we can certainly expect in our environment.

The reason I wrote this framework was that I wanted to create distributed, parallel programs which exploited large numbers of machines (like in a data-center) – without being limited by clojure’s in-JVM-threads-based model. So each worker process in Swarmiji gets deployed as a shared-nothing JVM process.

I will write up a post introducing Swarmiji in the next few weeks – once its a bit more battle-tested, and I’ve added a few more features (mainly around process management).

14 thoughts on “Startup logbook – v0.2 – distributed Clojure system in production

  1. curious: why the full jvm process vs. e.g. clojure-based concurrency? are you really getting the most efficient use of the hardware sevaks run on?

  2. Probably not, who knows?! I have yet to profile the system.

    Meanwhile, each sevak server process uses Clojure’s built-in agent system to handle requests. So it does utilize what I assume you meant when you said clojure-based concurrency. However, by just using agents alone, you can only scale by scaling your jvm (say by using terracotta). Swarmiji allows you to also scale across machines – while giving you resiliency because the failure of one machine doesn’t cause complete failure.

  3. Hi everyone, it’s my first pay a visit at this web page, and article is really fruitful for
    me, keep up posting these articles or reviews.

  4. Greetings from Florida! I’m bored to tears at work
    so I decided to check out your blog on my iphone during lunch break.

    I enjoy the info you present here and can’t wait to take a look when I get home.
    I’m surprised at how fast your blog loaded on my cell phone ..
    I’m not even using WIFI, just 3G .. Anyways, superb
    site!

  5. Thank you forr every other informative blog. The place else
    may I get that kind of information written in such a
    perfct approach? I have a mission that I am just now operating
    on, and I have been at the glance out for such info.

  6. First of all I would like to say fantastic blog!

    I had a quick question which I’d like to ask if you don’t mind.
    I was interested to find out how you center yourself and clear
    your thoughts before writing. I’ve had a tough time clearing my mind
    in getting my thoughts out. I truly do enjoy writing but it just seems like the first 10 to 15 minutes tend to
    be lost just trying to figure out how to begin.
    Any recommendations or hints? Kudos!

  7. You are so awesome! I do not think I have read through something like that before.
    So wonderful to find somebody with a few original thoughts on this topic.
    Really.. many thanks for starting this up. This web site is one thing that is
    required on the internet, someone with a bit of originality!

  8. Wonderful goods from you, man. I’ve understand your
    stuff previous to and you’re just too wonderful.
    I really like what you have acquired here, certainly like what you are saying and
    the way in which you say it. You make it entertaining and you still take care of to keep it sensible.
    I can’t wait to read far more from you. This is actually a great
    web site.

  9. Thanks , I have recently been looking for info about this subject for ages and yours is the greatest I have found out till now.

    But, what concerning the conclusion? Are you certain about the supply?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s