Build one to throw away, you will anyhow. — Fred Brooks
Fred Brooks said what he did many, many years ago but it is probably just as true today. How many times have you and your team gotten a few months into a project only to realize all the design mistakes you made? Ask any engineer, and they’ll tell you they would build it right the second time.
This is just reality, the nature of discovering the complexity of the domain or the technology or the usage pattern or whatever else you didn’t know about when you started.
On the other hand, there’s this [what Joel Spolsky says about rewriting software] -
It is the single worst strategic mistake that any software company can make — Joel Spolsky
So… what gives? The answer, IMHO, is basically two things –
1. Understand and internalize the idea of the strangler application
2. Architect your system in such a way to support strangling it later
In essence, this means that because it would be a bad idea to rewrite the entire system from scratch, it must be built in a way so as to enable swapping out components of it as they are rewritten (or perhaps heavily refactored).
The architecture must draw from an approach called concurrent set-based engineering (CSBE) – and indeed, sometimes each logical component would have more than one implementation. At Runa, two components of our system actually have two implementations each. And in each case, they’re both running in production – in parallel.
The way we accomplish this is through very loose-coupling. Additionally, because we take a very release-driven approach to our software process, our architecture evolves according to our current needs… and we refactor and extend things as new requirements are prioritized. At all times, despite our super-short release-cycles, our goal is to always have a version of the system in production. Whenever our pipeline tells us that a peice of the existing design may not work in the long term, we start to work on the replacements – more than one, and in parallel.
We run them in what I’ve been calling shadow-mode. This implies that its not quite part of the official system, but is running in order to prove some design hypothesis. Once everyone involved is satisfied with the results, we pick the most suitable sub-system and decommission the other contenders (including the old one). At Runa, we achieve much of our inter-component loose-coupling via messaging (our current choice is RabbitMQ).
To summarize – we design everything with one over-arching goal in mind – the thing will be thrown away someday, and be replaced with another. As I said before, this enforces a few things –
1. Loose coupling
2. Clear interfaces between components
3. Good automated system testing!
About that last point – because we have many moving parts, functional testing becomes even more important. We currently use Selenium for true functional testing (Runa is a web-based service) – and a variety of other home-grown tools for custom systems testing. Not only do automated system tests tell us that the collaborating set of components are working right, but they also allow us to change things with impunity – knowing that we’ll know if things break.
This thinking is what I’ve been jokingly calling Design For Throwability – and it’s been working rather nicely. It’s essentially a design philosophy that embraces CSBE – and is especially useful for small startups where everything is changing quickly – almost by definition.