Note: This is an updated version of an article I first wrote in 2011. The original site went offline for a while, and although it was later restored, thanks to heroic efforts by Philip Ashlock, I felt the article needed a new home, and wanted a chance to update it anyway. This version also incorporates some suggestions from V. David Zvenyach.
Over the years we’ve watched software projects of all sizes make the transition from closed-source to open source. The lesson we consistently draw from them is this:
If you’re running a software project and you plan to make it open source eventually, then just make it open source from the beginning of development.
Waiting will only create more work.
The longer a project is run in a closed-source mode, the harder it will be to open source later.
(Note that being open source from the start doesn’t force you to immediately take on the extra responsibility of community management. People often think that “open source” means “strangers distracting my programmers with questions”, but that’s optional — it’s something you might do down the road, if and when it makes sense for your project. It’s under your control. )
Projects seem to follow the rule pretty convincingly: the longer they run closed-source, the more difficult they are to open up later. Other people we’ve talked to have noticed the pattern too.
Why does it seem to hold so consistently?
I think there’s one underlying cause:
At each step in a project, programmers face a choice: to do that step in a manner compatible with the future open-sourcing, or do it in a manner not compatible with the future open-sourcing. Each time they choose the latter, the project gets just a little bit harder to open source.
The crucial thing is, you can’t help choosing the latter occasionally — all the pressures of development propel you that way. It’s very difficult to give a future event the same present-day consequences as, say, fixing the incoming bugs reported by the testers, or finishing that feature the customer just added to the spec. Also, programmers struggling to stay on budget will inevitably cut corners here and there (that is, they will incur technical debt), with the intention of cleaning things up later.
Thus when it’s time to open source, you will suddenly find:
- Customer-specific configurations and passwords checked into the code repository;
- Sample data constructed from live (and confidential) information;
- Bug reports containing sensitive information that cannot be made public;
- Comments in the code expressing perhaps overly-honest reactions to the customer’s latest urgent request;
- Archives of correspondence among the developer team, in which useful technical information is interleaved with personal opinions not intended for strangers;
- Licensing issues with dependency libraries whose conditions might have been fine for internal deployment (or not even that), but aren’t compatible with open source distribution;
- Documentation written in the wrong format (e.g., that proprietary internal wiki your department uses), with no easy translation tool available to get it into formats appropriate for public distribution;
- Non-portable build dependencies that only become apparent when you try to move the software out of your internal build environment;
- Modularity violations that everyone knows need cleaning up, but that there just hasn’t been time to take care of yet;
- Need I go on? Do some of these sound familiar?
The problem isn’t just the work of doing the cleanups; it’s the extra decision-making they require. For example, if sensitive material was checked into the code repository in the past, your team now faces a choice between cleaning it out of the historical revisions entirely, so you can open source the entire (sanitized) history, or just cleaning up the latest revision and open-sourcing from that (sometimes called a “top-skim”). Neither method is wrong or right — and that’s the problem: now you’ve got one more discussion to have and one more decision to make. In some projects, that decision gets made and reversed several times before the final release. The thrashing itself is part of the cost.
Waiting Causes Needless Infrastructure Adjustment Later
Delaying open sourcing also causes you to risk incurring extra overhead later from switching collaboration tools in mid-stream. A project run behind closed doors inevitably starts to use some of the internal collaboration infrastructure of the organization — infrastructure that won’t be available to other contributors later when the project open sources.
Being open source from day one encourages you to take advantage of the robust ecosystem for open-source projects from day one instead: free project hosting, continuous-integration services that don’t require paid membership if the project is open source, etc. Switching to these tools later, after your project is already under way, is a lot more work than just using them from the beginning.
Waiting Just Creates an Exposure Event
Another problem with opening up a completed code base is that it creates a needlessly significant exposure event. Whatever issues there may be in the code (modularity corner-cutting, security vulnerabilities, etc), they are all exposed to public scrutiny at once — the open-sourcing event becomes an opportunity for the technical blogosphere to pounce on the code and see what they can find.
Contrast that with the scenario where development was done in the open from the beginning: code changes come in one at a time, so problems are handled as they come up (and are often caught sooner, since there are more eyeballs on the code). Because changes reach the public at a low, continuous rate of exposure, no one blames your development team for the occasional corner-cutting or flawed code checkin. Everyone’s been there, after all; these tradeoffs are inevitable in real-world development. As long as the technical debt is properly recorded in FIXME comments and bug reports, and any security issues are addressed promptly, it’s fine. Yet if those same issues were to appear suddenly all at once, unsympathetic observers might jump on the aggregate exposure in a way they never would have if the issues had come up piecemeal in the normal course of development.
The importance of avoiding a needless exposure event is especially true for government projects, even more than for private-sector code. Elected officials and those who work for them are understandably sensitive to negative publicity, and are therefore risk-averse. Even if your team has been very conscientious, a worrying cloud of uncertainty will surround everything by the time you’re ready to open up hitherto closed code. How can you ever know you’ve got it all cleaned up? You do your best, but you can never be totally sure some hawk-eyed hacker out there won’t spot something embarrassing after the release. The team worries, and worry is an energy drain: it causes them to spend time chasing down ghosts, yet at the same time can cause them to unconsciously avoid constructive steps that might risk exposing real problems.
(For private-sector code, there are sometimes competitive reasons to stay closed until the first release, even if the project is intended to be open source in the long run. This is not an exception to the advice given here, it is simply a countervailing factor that should be weighed along with every other strategic consideration related to the project. One temptation to be suspicious of, however, is the notion that creating an exposure event is actually desirable — that dramatically open sourcing the code can itself provide useful publicity. In almost all cases, the better way to achieve that publicity would be by announcing the software’s “1.0” release later on. The feature set and product stability that that event represents is what users and partners actually care about, and that announcement gives potential contributors another natural moment to take interest, or renew their interest, in the open source code base.)
The Good News
The good news is that these are all unforced errors. A project incurs little or no extra cost by avoiding them in the simplest way possible: by running the project in the open from Day One.
“In the open” means the following things are publicly accessible, in standard formats, from the first day of the project: the code repository, bug tracker, design documents, user documentation, wiki (if any), and developer discussion forums. It also means the code and documentation are placed under an open source license, of course. It also means your team’s day-to-day work takes place in the publicly visible area (except for sensitive configuration data and the like — that of course stays behind your firewall).
“In the open” does not have to mean: allowing strangers to check code into your repository (they’re free to copy it into their own repository, if they want, and work with it there); allowing anyone to file bug reports in your tracker (you’re free to choose your own QA process, and if allowing reports from strangers doesn’t help you, you don’t have to do it); reading and responding to every bug report filed, even if you do allow strangers to file; responding to every question people ask in the forums (even if you moderate them through); reviewing every patch or suggestion posted, when doing so may cost valuable development time; etc.
Think of it this way: you open source your code, not your developers’ time. One of those resources is infinite, the other is not. You’ll have to determine whether engaging with outside users and developers makes sense for your project or not. In the long run it usually does, when done with care — different types of projects want different types of engagement. But the important thing is, it’s all under your control. Developing in the open does not change the degree of control you have over the project, it just ensures that everything you do is, by definition, done in a way that’s compatible with being open source. And you get that for free.
Open Source Helps Executive and Stakeholders
For executives and other stakeholders, early open-sourcing brings advantages too. The same collaboration tools used by the developers to coordinate work also provide visibility into that work for others. This is described well by Vinod Sankaranarayanan at ThoughtWorks; although he is writing there from an Agile perspective, his points apply quite clearly to open source as well (which is not a coincidence — there has always been a lot of overlap between the two).
Working in the open from day one also improves stakeholder trust through transparency. Rebecca Refoy-Sidibe of 18F wrote about how they polled the agencies they work with — that is, their customers — to find out how the experience was for them. Here’s just one sample, from the Department of Justice:
Department of Justice’s Office of Information Policy: The biggest benefit we saw from the open source approach was the active feedback, engagement, and buy-in from our public stakeholders. We had more confidence that what would be released to production would be of value because we had solicited and considered input from a diverse stakeholder group.
Open Sourcing Earlier Benefits Others Earlier
As Eric Mill of 18F writes in Working in public from day one:
Even if you’re working on what you think is the most niche project that no one else would ever use: leave the door open for providence.
You can’t know all the other organizations out there and what they need. They will always know their own needs best. Give them the opportunity to benefit from your work — and perhaps contribute to it — on their schedule.
The benefit they get is not only technical. When you make your code open source from the start, you provide an example that internal advocates for open source at other organizations can point to. “Look, they’re doing their project as open source, so why can’t we?” Running your project in the open gives credibility to the notion of open source development as a normal practice, and the sooner you open source, the sooner you make that example available to others who may urgently need it.
Earlier I wrote that community management is under your control: it’s something you do by choice, because of the benefits it brings to your project. But community management is also easier to control when scaled smoothly. It’s much better for your team to start doing it early, instead of suddenly dealing with an influx of interest at the same time as version 1.0 is launched. At the beginning stages of a project, the early adopters who show up tend to bring a lot of value, and receiving that value is relatively low-effort because it’s all coming form a small number of people. But the skills learned from handling those early contributors will pay off down the road. Start building the community management muscle memory early, when it’s cheap, so that later when the community is larger your team already knows how to do it efficiently.
Conclusion
If you want your software to be on-time, on-budget, feature-complete, and open source, then just develop it the way you normally would but with everything open source from the start. You’ll still only get two of the first three, of course — there’s no escaping the Project Management Triangle. But you’ll be taking the most efficient and effective route to being open source, and the project will be better off for it.