Note: This is an updated version of an article I first wrote in 2011. The original site went offline for a while, and although it was later restored, thanks to heroic efforts by Philip Ashlock, I felt the article needed a new home, and I wanted a chance to update it anyway.
Over the years we’ve watched software projects of all sizes make the transition from closed-source to open source. The lesson we consistently draw from them is this:
If you’re running a software project and you plan to make it open source eventually, then just make it open source from the beginning of development.
Waiting will only create more work.
The longer a project is run in a closed source manner, the harder it will be to open source later.
(Note that being open source from the start doesn’t force you to immediately take on the extra responsibility of community management. People often think that “open source” means “strangers distracting my programmers with questions”, but that’s optional — it’s something you might do down the road, if and when it makes sense for your project. It’s under your control.)
Projects seem to follow the rule pretty convincingly: the longer they run closed-source, the more difficult they are to open up later. Other people we’ve talked to have noticed the pattern too.
Why does it seem to hold so consistently?
I think there’s one underlying cause:
At each step in a project, programmers face a choice: to do that step in a manner compatible with the future open-sourcing, or do it in a manner not compatible with the future open-sourcing. Each time they choose the latter, the project gets just a little bit harder to open source.
The crucial thing is, you can’t help choosing the latter occasionally — all the pressures of development propel you that way. It’s very difficult to give a future event the same present-day consequences as, say, fixing the incoming bugs reported by the testers, or finishing that feature the customer just added to the spec. Also, programmers struggling to stay on budget will inevitably cut corners here and there (that is, they will incur technical debt), with the intention of cleaning things up later.
Thus when it’s time to open source, you will suddenly find:
- Customer-specific configurations and passwords checked into the code repository;
- Sample data constructed from live (and confidential) information;
- Bug reports containing sensitive information that cannot be made public;
- Comments in the code expressing perhaps overly-honest reactions to the customer’s latest urgent request;
- Archives of correspondence among the developer team, in which useful technical information is interleaved with personal opinions not intended for strangers;
- Licensing issues with dependency libraries whose conditions might have been fine for internal deployment (or not even that), but aren’t compatible with open source distribution;
- Documentation written in the wrong format (e.g., that proprietary internal wiki your department uses), with no easy translation tool available to get it into formats appropriate for public distribution;
- Non-portable build dependencies that only become apparent when you try to move the software out of your internal build environment;
- Modularity violations that everyone knows need cleaning up, but that there just hasn’t been time to take care of yet;
- Need I go on? Do some of these sound familiar?
The problem isn’t just the work of doing the cleanups; it’s the extra decision-making they require. For example, if sensitive material was checked into the code repository in the past, your team now faces a choice between cleaning it out of the historical revisions entirely, so you can open source the entire (sanitized) history, or just cleaning up the latest revision and open-sourcing from that (sometimes called a “top-skim”). Neither method is wrong or right — and that’s the problem: now you’ve got one more discussion to have and one more decision to make. In some projects, that decision gets made and reversed several times before the final release. The thrashing itself is part of the cost.
Waiting Just Creates an Exposure Event
The other problem with opening up a completed code base is that it creates a needlessly significant exposure event. Whatever issues there may be in the code (modularity corner-cutting, security vulnerabilities, etc), they are all exposed to public scrutiny at once — the open-sourcing event becomes an opportunity for the technical blogosphere to pounce on the code and see what they can find.
Contrast that with the scenario where development was done in the open from the beginning: code changes come in one at a time, so problems are handled as they come up (and are often caught sooner, since there are more eyeballs on the code). Because changes reach the public at a low, continuous rate of exposure, no one blames your development team for the occasional corner-cutting or flawed code checkin. Everyone’s been there, after all; these tradeoffs are inevitable in real-world development. As long as the technical debt is properly recorded in FIXME comments and bug reports, and any security issues are addressed promptly, it’s fine. Yet if those same issues were to appear suddenly all at once, unsympathetic observers might jump on the aggregate exposure in a way they never would have if the issues had come up piecemeal in the normal course of development.
The importance of avoiding a needless exposure event is especially true for government projects, even more than for private-sector code. Elected officials and those who work for them are understandably sensitive to negative publicity, and are therefore risk-averse. Even if your team has been very conscientious, a worrying cloud of uncertainty will surround everything by the time you’re ready to open up hitherto closed code. How can you ever know you’ve got it all cleaned up? You do your best, but you can never be totally sure some hawk-eyed hacker out there won’t spot something embarrassing after the release. The team worries, and worry is an energy drain: it causes them to spend time chasing down ghosts, yet at the same time can cause them to unconsciously avoid constructive steps that might risk exposing real problems.
(For private-sector code, there are sometimes competitive reasons to stay closed until the first release, even if the project is intended to be open source in the long run. This is not an exception to the advice given here, it is simply a countervailing factor that should be weighed along with every other strategic consideration related to the project. One temptation to be suspicious of, however, is the notion that creating an exposure event is actually desirable — that dramatically open sourcing the code can itself provide useful publicity. In almost all cases, the better way to achieve that publicity would be by announcing the software’s “1.0” release later on. The feature set and product stability that that event represents is what users and partners actually care about, and that announcement gives potential contributors another natural moment to take interest, or renew their interest, in the open source code base.)
The Good News
The good news is that these are all unforced errors. A project incurs little or no extra cost by avoiding them in the simplest way possible: by running the project in the open from Day One.
“In the open” means the following things are publicly accessible, in standard formats, from the first day of the project: the code repository, bug tracker, design documents, user documentation, wiki (if any), and developer discussion forums. It also means the code and documentation are placed under an open source license, of course. It also means your team’s day-to-day work takes place in the publicly visible area (except for sensitive configuration data and the like — that of course stays behind your firewall).
“In the open” does not have to mean: allowing strangers to check code into your repository (they’re free to copy it into their own repository, if they want, and work with it there); allowing anyone to file bug reports in your tracker (you’re free to choose your own QA process, and if allowing reports from strangers doesn’t help you, you don’t have to do it); reading and responding to every bug report filed, even if you do allow strangers to file; responding to every question people ask in the forums (even if you moderate them through); reviewing every patch or suggestion posted, when doing so may cost valuable development time; etc.
Think of it this way: you open source your code, not your developers’ time. One of those resources is infinite, the other is not. You’ll have to determine whether engaging with outside users and developers makes sense for your project or not. In the long run it usually does, when done with care — different types of projects want different types of engagement. But the important thing is, it’s all under your control. Developing in the open does not change the degree of control you have over the project, it just ensures that everything you do is, by definition, done in a way that’s compatible with being open source. And you get that for free.
If you want your software to be on-time, on-budget, feature-complete, and open source, then just develop it the way you normally would but with everything open source from the start. You’ll still only get two of the first three, of course — there’s no escaping the Project Management Triangle. But you’ll be taking the most efficient and effective route to being open source, and the project will be better off for it.