Why Datomic?

Cross-posted from Zololabs.

Many of you know we’re using Datomic for all our storage needs for Zolodeck. It’s an extremely new database (not even version 1.0 yet), and is not open-source. So why would we want to base our startup on something like it, especially when we have to pay for it? I’ve been asked this question a number of  times, so I figured I’d blog about my reasons:

  • I’m an unabashed fan of Clojure and Rich Hickey
  • I’ve always believed that databases (and the insane number of optimization options) could be simpler
  • We get basically unlimited read scalability (by upping read throughput in Amazon DynamoDB)
  • Automatic built-in caching (no more code to use memcached (makes DB effectively local))
  • Datalog-as-query language (declarative logic programming (and no explicit joins))
  • Datalog is extensible through user-defined functions
  • Full-text search (via Lucene) is built right in
  • Query engine on client-side, so no danger from long-running or computation-heavy queries
  • Immutable data – audits all versions everything automatically
  • “As of” queries and “time-window” queries are possible
  • Minimal schema (think RDF triples (except Datomic tuples also include the notion of time)
  • Supports cardinality out of the box (has-many or has-one)
  • These reference relationships are bi-directional, so you can traverse the relationship graph in either direction
  • Transactions are first-class (can be queried or “subscribed to” (for db-event-driven designs))
  • Transactions can be annotated (with custom meta-data) 
  • Elastic 
  • Write scaling without sharding (hundreds of thousands of facts (tuples) per second)
  • Supports “speculative” transactions that don’t actually persist to datastore
  • Out of the box support for in-memory version (great for unit-testing)
  • All this, and not even v1.0
  • It’s a particularly good fit with Clojure (and with Storm)

This is a long list, but perhaps begins to explain why Datomic is such an amazing step forward. Ping me with questions if you have ‘em! And as far as the last point goes, I’ve talked about our technology choices and how they fit in with each other at the Strange Loop conference last year. Here’s a video of that talk.

About these ads

5 thoughts on “Why Datomic?

  1. It does look very interesting, one of the three distributed and consistent databases I know of (the others are Google Spanner and Hyperdex).

    However, I have a few concerns:
    - it’s on the JVM, that’s less than ideal
    - queries appear to require fetching a lot of data, so frequent queries on fresh data are likely expensive
    - writes appear to be a significant bottleneck
    - there still doesn’t seem to be a good way to destroy data.

    Also, my biggest concern is that it’s closed source. It’s hard to put up with that when there are so many good open source versions.

  2. Lucian – addressing your concerns:
    “it’s on the JVM, that’s less than ideal” – why is that less than ideal?

    “queries appear to require fetching a lot of data, so frequent queries on fresh data are likely expensive” – in answer to the first part of your statement, it depends on the data but many use cases would allow the data to fit entirely in a peer’s cache. A peer’s cold start would take whatever hit on performance but after that the data is local to the peer. To get new data, a peer only needs to transfer the difference of whatever is relevant to a query -or- you can program so a peer subscribes to new data as its accumulated.

    “writes appear to be a significant bottleneck” – mostly no. Because the writer (the Transactor) is dedicated to the task it’s not actually possible to max out the system with writes under most situations. If you feel you need infinite write scaling then datomic wouldn’t be your solution but I would encourage you to actually evaluate whether the ongoing novelty of your data could saturate I/O. The above post cites the ability to write hundreds of thousands of tuples per second. How many use cases require more throughput than that outside of infrequent bulk imports of data?

    “there still doesn’t seem to be a good way to destroy data” – other people could give better answers to this but I can tell you what I plan to do for a project I’ve been working on. I’m using datomic to store parsed syslog data and each day’s worth of data will be stored in its own database. With datalog you can run your queries across multiple databases. When I’m ready to expire data I’ll just stop querying older databases and then delete the data.

  3. The JVM has only ever annoyed me. I try to stay away from it if I can, especially since I primarily use Python. It’s also the reason I’ve used Clojure(Script) less.

    I have several use-cases where peers couldn’t possibly store all data locally. Also, having to wait a long time on the first query for the cache to get populated is not acceptable, I want my queries to have predictable latency. Perhaps copying the db to peers as part of the bootstrap step would remove the initial latency, but I’m still not sure what to do about replicating a very large database to every single worker.

    The transactor may be fast, but there is at least two network hops worth of latency (writer to transactor, transactor notification to reader). That is less than ideal, but perhaps not a problem in practice as you say.

    Being able to (rarely) entirely destroy data is necessary for legal reasons. Users might wish to have all of their data removed, which means it would have to be hunted in all local caches, or something to that effect.

  4. If your preferred fruit isn’t in season, use a
    package of frozen fruit with its syrup. Directions: Combine ingredients (except
    for chocolate chips) in blender and blend until smooth.
    cup frozen strawberries, and one scoop vanilla protein powder in a blender and blend
    until smooth and frosty.

  5. I’m evaluating Datomic as the primary storage layer in an application that I am writing. Are there any resources out there that you would recommend to give a good overview? I am intrigued that it has full-text search baked in via Lucene, since full text search in the storage layer itself is definitely something that would be useful in my app.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s