Moving Repositories Between Project Hosting Platforms

No matter how tightly developers are committed to their current project hosting provider (GitHub, GitLab, GNU Savannah, or whatever), new ones will come along over time. The history of web services is replete with turnover, and project hosting forges all follow the inevitable trend. But the cost of migration is formidable: It’s quite easy to setup a new project host like GitLab, but how do you move the whole structure of your team’s code, branches, comments, issues, and merge requests into their new home?

Software Heritage, a non-profit with the mission of archiving free software code, faced this daunting challenge when they decided to move from Phabricator to the more vibrant GitLab. For a while, a lot of free and open source projects had found Phabricator appealing, but the forge had been gradually declining and officially ceased development in 2021.

At OTS, we developed an open source tool and framework to support migrating to a new project hosting platform. We used it to move all of Software Heritage’s projects from Phabricator to GitLab, but the framework is robust enough to support migrations between almost any project hosts.

The tool is called Forgerie. Its goal is to automate the migration of projects from one hosting system to another. Forgerie is extendable to any source and destination. It translates input from a project hosting platform into a richly-featured internal format, then exports from that format to the destination platform.

This is the same method used by many tools that perform n-to-n migrations. For instance, the health care field contains many incompatible electronic record systems, so migration tools usually create an intermediate format to cut down on the number of necessary format conversions.

OTS continues to work on Forgerie as part of its offering of migration services to clients. If you would like to use Forgerie, please grab it from Forgerie’s GitLab page or contact us if you would like help with a migration.

The rest of this post offers some technical background on Forgerie. It should be of interest to anybody solving similar project hosting problems or, more generally, to anybody working on moving structured data into a new data store. Many migration projects fall into the traditional category of Extract, Transform, Load (ETL), but the richness of data stores today stretches the category into new realms.

Forgerie

The Forgerie code was initiated by OTS developer Frank Duncan and released under the GNU Affero General Public License v3.0. This post delves into the project goals along with suggestions for the future of this project. We’ll look at the difficulties posed by this major migration project and how we handled them. This story may offer lessons and tips to people dealing with all kinds of data migration.

If you have used a project hosting system, you might well be imagining the massive requirements for even such a limited project. Code in a forge exists in many branches, each created by multiple commits and enhanced by merges. Numerous issues (change requests) have been posted by different users, along with comments that refer to the issues by number. Commit messages also link and refer to the numbers of issues and branches.

The need for a general project hosting migration tool

Tools for importing projects exist for various project hosting platforms, but they are limited. GitLab does a pretty good job importing a repository from GitHub, and GitHub from GitLab, and both allow the uploading of a private repository. Later in this article we’ll examine one particular limitation of all these import tools: handling multiple contributors.

To automate the migration from Phabricator to GitLab, Software Heritage contracted with Open Tech Strategies (OTS), a free and open source software consulting firm. Preliminary research turned up a few tools claiming to perform the migration, but none of them did a complete job. And each migration tool works only with one particular forge as input and another as destination. OTS decided to design its new tool as a general converter that could be adapted to any source and target repositories.

Migration thus requires the automated tool to reproduce, on the target forge, all the projects, branches, commits, merge requests, merges issues, comments, and users recorded in the source repository. If possible, contributors should be associated with their contributions.

OTS chose to create Forgerie in Common Lisp, which seems like an odd choice in the 2020s. But Common Lisp is well-maintained and robust. Its big advantage for the Forgerie project was that Lisp makes database-to-dictionary conversions easy. Because Phabricator stores data in a relational database, database-to-dictionary conversions were the central task in automating the migration.

The Forgerie project has three subdirectories: a set of core files used by all migrations, egress files for Phabricator, and ingress files for GitLab. This design leaves room for future developers to extend the project by adding more ingress and egress options. In order to go from Phabricator to GitHub, for instance, a maintainer can reuse the existing core and Phabricator directories.

Impedance mismatches create challenges

All forges offer basic version control features, along with communication and management tools such as issues. But each forge is also unique. In this case, Duncan had to decide how best toaccommodate features that differ or are missing in the target GitLab platform.

The biggest challenge Duncan faced is that GitLab maps projects to repositories on a one-to-one basis, whereas Phabricator treats a project as a higher-order concept. A project in Phabricator can contain multiple repositories, and a repository can be part of many projects. Phabricator also supports multiple version control tools (Git, Mercurial, etc.). Making Forgerie flexible enough to smooth over these types of differences in data structure was a key goal.

The different approaches to projects introduced several complications. First, Duncan had to make sure that each message and ticket pointed to the right GitLab project.

Merge requests were the hardest elements to migrate, because in Phabricator a changeset can span multiple repositories. The requirement that Duncan had to implement was to preserve the sequence of events in the original forge strictly, so that issue 43 in the old forge remains issue 43 in the new forge. That way, any email message or comment referring to the issue still refers to the right one.

Lots of details had to be tidied up. For instance, Phabricator has its own markup language to add rich text to comments and issues. This language had to be converted to Markdown to store the comments in GitLab.

The question of multiple contributors

When there are many people to credit for their contributions, the import tool has a tough nut to crack. Awarding credit properly is crucial because many contributors rest their reputations on the record provided by their contributions. Statistics about the number of commits they made, the “stars” they got, etc. undergird their strategies for employment and promotions. Losing that information would also hurt the project by making it hard to trace changes back to the responsible person.

On the other hand, security concerns preclude allowing someone to import material and attribute it to somebody else.

GitLab solves this problem if the input repository is set up right: The person doing the import needs master or admin access and has to map contributors from the input repository to the destination respository. If access rights don’t allow the import to add material to a contributor’s repository, GitLab’s import can accurately attribute issues to the contributor, but not commits.

Forgerie goes farther in preserving the provenance of contributors: It keeps track of Phabricator users and creates a user in GitLab for each user recorded in the Phabricator repository. The Software Heritage project did not present difficulties because no contributor had an account in GitLab. To be precise, the email address that identified each Phabrictor contributor didn’t already exist for any GitLab contributor. If GitLab had an account with the same email address as an account being imported, the system would have issued an error and prevented Forgerie from importing the contributor’s commits.

A few implementation details

Forgerie carries out a migration by creating a log of everything that happened in the source repository, and replaying the log in the target forge. Phabricator uses a classic LAMP stack, storing all repository information into a MySQL database. Forgerie queries this database to retrieve each item in order, then invokes the GitLab API to create the item there.

The GitLab API is relatively slow for those particular types of request, requiring one or two seconds for each request, and repositories can contain tens of thousands of items when you count all the merges, comments, etc. So you can expect a migration to take 24 hours or more.

Long runs call for checkpoints and restarts. When Duncan designed the simple version of Forgerie for him to run just once, he figured he could just restart the run if it failed. Later he realized that restarting after 23 hours became unacceptable.

The log solves this problem through a kind of simple transaction. You can conceive of the migration as moving through three stages (Figure 1). In the first stage, items are in the old platform but not the log. In the second stage, Forgerie adds the items to the log. In the third stage, items are safely loaded into the destination platform and can be removed from the log. Should the job fail, the user can restart it from the beginning of the log.

Figure 1: Logging items as they move from source to destination platform.

A classic issue with transactions arises with a log: Suppose an item has just entered the target forge but Forgerie did not have a chance to remove the item from the log before a failure. The item exists in both the target repository and the log, so when Forgerie starts up again, the item will be added a second time to the repository. Forgerie developers do not have to worry about this happening because the insertions are idempotent. The second insertion overwrites the first with no corruption of information.

Assessing the Forgerie project

The Forgerie code base is surprisingly small–a total of 2,726 lines, divided as follows:

     • Core (shared) code: 350 lines

     • Phabricator-specific code: 1,233 lines

     • GitLab-specific code: 1,143 lines

No platform lives forever. Amazing as the capabilities of GitHub and
GitLab are—and they continue to evolve—there will come a time when developers decide they have to pick up and move their code to some glorious new way of working. Forgerie tries to make migration as painless as possible.

Thanks to Andy Oram for assistance drafting this post, to Jim McGowan for making the diagrams, and to Antoine R. Dumont of Software Heritage for contributing technical improvements to the Forgerie project.

Need help migrating off Phabricator?

Open Tech Strategies can help you migrate off of Phabricator now that it has reached end-of-life. We developed Forgerie, an open source tool that aids in migration between code forges. Forgerie extracts data from your Phabricator instance and injects it into a GitLab instance. It can also help move repositories to a GitHub account.

Using Forgerie is a process. It requires setting parameters, running the tool, examining the results, tweaking the parameters, and re-running until the result meets your needs. Our team can help you with this process. We can move you to your own GitLab instance, host an instance for you, or get you migrated to GitLab.com or GitHub.

We’d love to help you transition from Phabricator. Drop us a line at info AT opentechstrategies.com and we’ll get you safely to your new home.

Open Source Readiness Models

The Case Of The Missing Screwdriver, by Brian Kay, CC-BY, Wikimedia Commons: https://commons.wikimedia.org/wiki/File:The_Case_Of_The_Missing_Screwdriver_(115416535).jpeg

(This is the seventh post in our Open Source At Large series.)

Open source is a powerful tool that offers great benefits to organizations that make, maintain, or deploy software. Most teams know they need open source capabilities. The hard part is finding a focus around adding skills. In most cases, the most important consideration is building capability to know how and when to use FOSS strategies.

FOSS is all about execution. Reaping the benefits of open source investment requires nailing a series of difficult steps ranging from designing an initial strategy to building an appropriate community to leveraging the resulting dynamics for strategic advantage. Many organizations are not prepared to travel that path and reap those rewards at scale. At OTS, we talk about that level of capability as organizational readiness, and we describe the journey to mastery as one of gaining capability.

To do that, we might locate a team in a readiness model. This clarifies where they are and also suggests next steps and likely results. Often, organizations use these models to identify areas for potential growth. We find the models are most useful for firms early in their open source journey, and this post focuses on those early stages.

There are several readiness models. OTS has for many years described climbing the readiness ladder (in various publications). Microsoft’s Jeff McAffer sees common patterns in an engagement model. GitHub also sketched maturity model as the start of a discussion on readiness. Similarly, Wikipedia reproduces an interesting model from Qalipso. There are a number of such models, and it is often worth considering more than one when examining an organization. Of the published models, we like McAffer’s in particular because it includes strategic components, accounts for realistic failure modes, and understands that open source readiness will be unevenly distributed in large organizations.

On many teams, initial open source capabilities might be nascent. Most team members have not had significant (or perhaps even any) experience using open source strategies to create value. The team works in an environment where FOSS investment is rare, and many do not see much reason to change that. That lack of knowledge might translate in some quarters into hostility toward FOSS. People will say “It can never work here” even as open source slowly seeps into more and more of the technology around them.

At this stage, FOSS strategies can be difficult to execute. Internal political risks might be high. Policies might explicitly forbid engaging FOSS. Staff doesn’t know how to begin to work with external open source teams. Many people will lack even a basic understanding of what it means to do open source work. Efforts to work in an open source mode are likely to fail in ways that reinforce the belief that open source is not worth further consideration.

Many factors might move an organization from hostility toward tolerating open source, but movement usually comes from external pressure, changing environments, and staff turnover. Conditions around the organization begin to change, and the costs and risks of refusing to engage start to rise.

Those costs might include the pain of maintaining internal forks of external open source projects (or, more precisely, the pain of not maintaining all those forks). Similarly, the benefits of making minor open source investments start to become clear, even if only from watching others make those investments and reap the benefits while your organization loses ground.

However organizations begin to come around, tolerating open source is all about experimentation. Firms usually don’t think of it in those terms. They are making small concessions to necessity. They are seizing unique, one-off advantages. Most don’t think of those small projects as the future direction of the company. They should consider the possibility, though. Sometimes, explicitly labeling such experiments as learning exercises and skill-building allows an organization to maximize the value of their experimental investment. It prioritizes reflective analysis and learning. It gives permission to fail. Those can be useful to companies seeking adaptability. When considering McAffer’s model, we might relabel his “tolerance” phase as “experimentation”.

Experiments come in many forms, but the most common first experiment is using some outside open source code and engaging the open source project. That might involve filing bug reports, offering a contribution, or merely participating in project mailing lists and forums. These are all relatively low-risk ways to begin connecting an organization to outside FOSS projects.

The experimentation phase is usually a skill-building and knowledge-gaining phase because it exercises the skills that cause an organization to shift from merely tolerating open source to trying to harness it. Having those skills provides the vision that starts to shift attitudes.

The problem that arises as companies start to embrace FOSS is that they lack the infrastructure to succeed at it across the entire organization. They are missing policies, auditing, skills, culture, and experience. This is a pivotal, risky moment. A large number of teams will still be in the denial phase. Efforts to move internal culture toward FOSS will be perceived by some as a pointless shift toward the latest buzzword. Those experimental skills will be unevenly distributed internally. Many new open source projects will fail, and this will convince some that FOSS is a failure. Some might even sabotage FOSS projects.

McAffer sees this phase as one of hype, and perhaps that’s because it’s also when an organization embraces open source without quite being ready to execute. Firms in this phase tend to engage FOSS in shallow, unsophisticated ways simply because they don’t yet have the experience to make better strategic use of open source opportunities. The way to move past this stage is not to reduce the hype (though that might help) but rather to increase readiness.

Managers at this stage — especially middle managers — need guidance on using FOSS as a strategic component and on managing teams with more external deliverables. Developers need technical infrastructure, easy-to-follow licensing policies, and permission to engage externally. Perhaps more importantly, they need to develop new habits of working in the open and sharing even early, rough versions of their work. Beyond just technical teams, Human Resources needs hiring and compensation guidance as both skills and performance evaluation criteria shift. Providing all of those pieces is the process of gaining proficiency. Adding skills, process, and policy is how that happens, and it requires management approval and resources. Companies that don’t provide this support from fairly high in the org chart tend to plateau at this level of readiness.

All of the above describes a path from the very beginning toward eventual mastery of open source. We think of readiness in terms of skills and capabilities, but the truth is that doing open source well is primarily a cultural shift. Organizations using open source fluently quickly find that the open approach becomes their default process. That culture shift will be the topic of a future post. For now, though, where is your organization on the path toward readiness? What are the steps you can take to help prepare for an increasingly open source tech landscape?

Keep Your Friends Close

Picture of Mount Rushmore, by Dean Frankling, CC-By-SA
Mount Rushmore

(This is the sixth post in our Open Source At Large series.)

One of the insider secrets of free and open source software (FOSS) is that most of the rules a project uses on a day-to-day basis are not found in the software’s license. There are contribution guidelines, which are enforced by the project only taking contributions that meet them. There are codes of conduct, which are a condition of community participation. There are endorsements, official membership, a voice in setting the project roadmap, and all kinds of other benefits that attach to varying types of community participation. In each case, entirely external to the license, there are official rules and unwritten norms that govern how participants gain the benefits of joining the civic life of a project.

If you were to make an ecosystem map of an open source project, you might place the project in the middle of the page and then depict scale of involvement as distance from that center. The closer to the center a participant sits, the more influence the project has on them; the further from the center, the less sway the project has.

At the center is the project itself: its core developers and the people who have made commitments that affect the project’s outputs and actions. A project has a lot of visibility into how these participants act because tight, highly-connected cooperation is beneficial for everyone, and so participants are motivated to act in ways that avoid damaging that cooperation. This mechanism is so natural that most projects do not often think of it as something they could expand intentionally. But sometimes projects do exactly that: they figure out ways to deliberately widen their sphere of influence.

For example, Joomla, maintains a directory of third party extensions. It is the way most users discover Joomla extensions. For many businesses based on providing Joomla extensions, absence from that directory is akin to not existing at all. When the Joomla project decided to tighten license compliance among its extension developer community, they didn’t ask their lawyer to run around issuing threats. They simply explained that any project that wanted to appear in that directory must abide by community rules, https://docs.joomla.org/Extensions_and_GPL. Extension developers came into line.

A similar example can be seen in the Guidelines for Commercial Entities at the Arches Project. A glance over the guidelines will show the kinds of real-world problems they were developed to address. Only those who agree to the guidelines are listed in the official directory of Arches service providers.

Of course, being in some kind of project-endorsed directory is just one type of gateway. Another is participation in the project at all, that is, the ability to take part in project discussions, to vote (when there are decisions made by vote), and to have one’s contributions evaluated and accepted by the project with full attribution. Getting contributions accepted into the core project on a regular basis is important for those whose businesses depend on the project. If they can’t get their bugfixes and new features accepted upstream, then they may be forced to maintain their own divergent version (the term of art is “vendor branch”) indefinitely — a situation whose technical and organizational costs only get worse over time.

The right techniques will differ from project to project, because they must be based on the particular project’s history (as in the examples above). But the general reason these techniques work is that the non-code parts of a project are valuable in their own right. Those parts are not covered by the code’s license, but rather by the project’s norms and rules. Crucially, these parts cannot be replicated: unlike the code, you can’t make a copy of a community, or of a developer’s attention, or of an endorsement’s value. Equally crucially, none of them can be demanded by bad actors. The benefits of participation flow naturally to community members in good standing and it is equally natural to deny them to people and firms that refuse to align themselves with the community ethos. Creating structures that allow projects to control access to community benefits is a powerful way to enforce norms.

Using community participation as the mechanism for promulgating norms has its limits. Some participants stay far enough from the center of the project that they are effectively immune to community inducements. (Fortunately, projects have other mechanisms available to influence them, and we will cover some of those in a future post.) But in most cases, organizations that have a core reliance on the code will find multiple reasons to stay in good standing with the community, and this means the project has a chance to influence how those organizations behave. Spotting these leverage points takes experience as well as an understanding of project goals and positioning. Projects that want to wield influence over their ecosystem — whether for strategic or ethical ends — should actively look for ways to provide value backed by network effects, until the case for participation is overwhelming.

Thanks to Microsoft for sponsoring the Open Source At Large blog series.

Spot The Pattern: Commoditization

Round Mountain Gold Mine. Pic from https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Round_Mountain_gold_mine%2C_aerial.jpg/1024px-Round_Mountain_gold_mine%2C_aerial.jpg
Round Mountain Gold Mine

(This is the fifth post in our Open Source At Large series.)

Last week, I spoke with the CEO of a company that makes a proprietary category-leading workflow product. He asked me “what should we open source and when?”

This is one of the most common questions we get. There is no easy, standard answer to this question, but I usually start by looking for features of the ecosystem that either are or could be commoditized.

A commodity is something that is the same no matter who makes it. Gasoline is a commodity. Customers don’t really care where they buy gas. They want it cheap and reliable, but the gas itself should be the same stuff regardless of which oil company refines it and pumps it into your car.

In practice, of course, gas from different companies is not identical. Some places enhance it with detergents, mix it with ethanol, or add stabilizers.  But for the most part, consumers ignore these small differences.  You put it in your car and it works.  People pick gas stations based on price and convenience.

It’s hard to charge a premium for a commoditized good. People are not willing to pay extra for a product if they can get the equivalent for less elsewhere. As a result, gas stations compete on price.  This squeezes margins, and so in the end they make their money on the convenience store, just like movie theaters do with popcorn.

Open source software businesses are often in the same position. There’s a core offering that is, to put it bluntly, boring. Anybody could offer this boring core without innovating. You cannot be in the market without matching the core’s offerings, and customers want standard interfaces these days, so there’s not much opportunity to differentiate on the basic core offering.

Database servers follow this model. From a business perspective, storing rows of data is not interesting. Everybody can do it, and nobody really does it much better than anybody else. There are performance improvements at the margins, and we can build interesting services on top of storing information, but basic data storage and retrieval is just facilities maintenance, not the thing your customers choose you for.

Of course, not all data storage is basic. You can store things faster or in ways that make access easier. You can stack data vertically instead of horizontally. You can keep all the data in memory, duplicate it across multiple servers, or sync with client-side storage. These are features that can set one product apart from another. You might be able to charge a premium for them. None of them, though, is actually the boring, basic task of putting data on a disk and making sure you can find it later.

For a business that manages data but can’t find competitive advantage on storing it, the database is a cost. It’s not worth spending a lot of money to develop or acquire a database that improves on the standard set of features because there’s just not much improvement to be had. Best to source those features as cheaply as possible.

Of course, the same logic applies to everybody else in your industry. We’re all trying to get the boring stuff done as cheaply as possible. This is where open source cooperation comes in. We can’t compete on those features, so there’s no point being competitive about it. We use open source to get together and collaborate with all the other folks who want rock-solid databases. Open source lets us do this even if we’re in competition on a range of other fronts.

We see this pattern a lot in free and open source software. Product categories go open source, then contract, then perhaps expand when somebody figures out the next frontier of differentiation. Most recently, we’ve seen this effect around web rendering engines. Every browser’s goal is to depict HTML in standard ways. Aside from being at Google’s mercy, there’s little reason for most companies to make their own rendering engine when there’s a pretty good one that is open for the taking.

Spotting this pattern early is a crucial step in knowing what to open source and when.  You can lead or let somebody else lead — either might be strategically useful, but spotting the pattern lets you choose which position to take.

Sometimes it makes sense to open source what you have mainly as a signal to others that an area is becoming commoditized and so they should open source their stuff too.  That is, you don’t necessarily have to be aiming for a dominant position in a new open source area — in fact, you often don’t have to decide that in advance.  Instead, you open source because an area is about to become commoditized, and the earlier you shift your investments, the better positioned you will be get the type of influence that serves your needs in the inevitable post-commoditization universe.  (For more about the various forms that influence can take, see Open Source Archetypes.)

Once you start looking for the commoditization pattern, you see it all over the open source world. It is a common tool for understanding the strategic position of all kinds of products. In addition to the basic pattern, there are two more big concepts around commoditization worth considering: the cyclical nature of commoditization and the factors that allow resisting that cycle. We’ll cover those in future posts.

Thanks to Microsoft for sponsoring the Open Source At Large blog series.

Ecosystem Mapping

A photo of the ski trail map at Masik Pass in North Korea. Photo Credit: Uri Tours

(This is the fourth post in our Open Source At Large series.)

All the power of open source comes from throwing in with your neighbors, even the neighbors you don’t like very much. For most projects, the main reason to get involved in open source is to create productive relations with as many participants as possible, including rivals. Doing this well requires understanding who is in your ecosystem and how they relate to each other.

Whenever a team comes together to plan their open source strategy, they need a way to gather that understanding. They want to pool their knowledge and get everybody working from the same set of facts. The best tool we’ve found for this is ecosystem mapping.

There are many ways to visualize groups of stakeholders. We generally recommend starting with who and going from there to what. You can capture users, contributors, service and support providers, partners, funders, investors, deployers, integrators, and competing or adjacent efforts. Grab anything important to the questions directly in front of you, and don’t worry about being complete.

Ecosystem maps are lightweight. They should be messy, quick, and replaced often. The best way to make one is to hand-draw it on a large piece of paper or on a whiteboard, ideally as a group exercise. Snap a pic for future reference, but don’t bother taking the time to redraw them neatly. In a fast-moving project, the terrain these maps describe will shift often. And the reasons why you might draw a map will change even faster. It is not unusual to make several different maps of the same ecosystem in an afternoon.

Here is a simplified version of an ecosystem map drawn by the Arches Project. (We reproduced it in Dia for this article, but normally we wouldn’t bother to digitize a map, beyond photographing the whiteboard or paper it was drawn on.)

Ecosystem Map for the Arches Project

Notice how the diagram is primarily designed to highlight geography but also uses color and shape to distinguish between different types of participants.

The day the team drew this map, we wanted to understand where the project was succeeding geographically and how to support participants spreading the project into new communities. We suspected that having a set of committed users and service providers doing custom deploys were both important, so we mapped it to kick off a group discussion. As we talked through planning, we referred to the map, adjusted it at times and later even drew another map with a new focus. The diagram was a guide for conversation and let everybody agree on parameters quickly.

This is a map drawn of the Tor Project by Dlshad Othman:

Project Map of the Tor Project

This map is more centered on interactions with the Tor Project. It doesn’t mention geography at all, and it uses enclosing shapes to group types of participants in a venn-diagram. It shows roles and relationships with a heavy focus on the project itself.

There is no one right way to draw an ecosystem map. There are, however, some signs that your map is not set up to capture relationship complexity:

  • It is shaped like a star, with all your connections coming back to one central entity. 
  • It is more of a cloud than a map. If the map doesn’t depict relationships between entities, it’s not doing its job.
  • It tries to answer too many questions at once.  Maps are usually single-use snapshots designed to highlight one aspect of your ecosystem.  As two-dimensional representations made quickly with a limited palette of colors and symbols, these maps can show complex relationships, but not easily accommodate high-cardinality data views.

That said, do whatever works for your purpose! Experiment with different techniques, and draw maps that highlight different types of information. If you make a map using some of these techniques, let us know in the comments. We’d love to see pictures of maps that might turn into future examples as we continue to help people apply this approach to crafting open source strategy.

Thanks to Microsoft for sponsoring the Open Source At Large blog series, and also to Dlshad Othman and the Arches Project for kindly letting us use their maps as examples.

Open Source Goal Setting

A soccer goal with a gorgeous snow-capped mountain backdrop.

(This is the third post in our Open Source At Large series.)

Open source is a strategic tool, not an end in itself. It is the stone in your stone soup. You don’t eat it — it’s just the invitation.

You reach for open source to create effects that will support your broader strategy. We’ve talked to dozens of clients about why they invest in open source, and the reasons tend to be fundamental and long-term: to achieve a cultural change in their technical organization, to influence a market’s direction, to recast relationships with partners, etc. Direct revenue is rarely the main goal of open source investment, even for for-profit businesses. Rather, open source is used to create an environment in which revenue-generating activities can thrive.

Below is a checklist, or rather a provocation list. It’s meant to help you answer the question “What effects do we most want from our open source investment?”

Treat this list as a menu, not a buffet. Pick three items and make them your high priority targets. Focus on effects that connect best to your strategy, and, ultimately, to your organization’s mission. You need to know where you want to go before you can chart a course to get there. We’ve broken the goals into three categories, but you can mix and match across or within categories as you please.

Development and Collaboration

  • Expand or amplify developer base
  • Gain market insight
  • Gain or maintain insight in a particular technical domain
  • Influence a technical domain
  • Create a framework for partner collaboration
  • Lead a standardization effort
  • Disrupt an incumbent, hold off insurgents

External Positioning

  • Ease customer fears of vendor lock-in
  • Deepen engagement with users, create more paths for engagement
  • Transparency for customers and partners
  • Establish a basis for product reputation
  • Organizational branding and credibility
  • Product branding and credibility

Internal or Structural Change

  • Improve internal collaboration (cross-individual or cross-departmental)
  • Create opportunities for internal mobility
  • Expand or reshape hiring pool, expedite recruiting
  • Improve morale and retention
  • Create flow-paths for bottom-up innovation
  • Improve and maintain open source capabilities (technical and social)

Again, we emphasize the importance of picking just a few. Winnowing down to just the most important goals is usually illuminating, because it forces your organization to articulate what it’s really after. Every item on the menu might look inviting, and any of them can be pursued opportunistically, but a strategy that tries to chase all these goals at once will go nowhere.

If you have goals for open source investment that don’t appear on this list, we’d love to hear them. The list was built up over years of experience, and we generally find that we can map from it to the specifics of a particular client’s or project’s needs — most open source dreams appear somewhere on this list. But that doesn’t mean we can’t be surprised, and we’re always happy when we are.

Thanks to Microsoft for sponsoring the Open Source At Large blog series.

What Is Open Source Strategy?

Misty mountains, photo by Pixabay user himalayadestination: https://pixabay.com/photos/mountains-climb-mountaineer-3970320/

(This is the second post in our Open Source At Large series.)

There is a lot of documentation out there on how to do open source well at the project level. Historically, many projects have been started by developers, often on their own initiative, and the first non-technical questions they faced tended to be about project coordination (like “What collaboration tools shall we use?” or “What will our code review practices be?”) and about community management (like “How do we decide who has commit access?” and “How do we integrate newcomers smoothly?”). Because developers hate to solve the same problems over and over, there is a wealth of detailed and varied material addressing those sorts of questions (we’ve even written some ourselves, but it’s just a drop in the bucket of what’s available). Taken together, this literature thoroughly answers the question “How do we execute the best tactics for developing open source software?”

But there isn’t yet a lot of material on open source strategic thinking. Indeed, it’s traditionally so under-discussed that often when we talk about it people think that we’re talking about the nuts-and-bolts of how to run projects, rather than the broader question of how an organization uses open source to further its mission.

This blog series is about open source strategic thinking, so the first thing we want to do is define what that is. It overlaps with tactics, of course. For example, the tactical question “How do we integrate newcomers smoothly?” unfolds to become the strategic question “What are the long-term returns we want from engaging with others, who are those others, and what kinds of investments should we make in order to achieve those returns?”

Let’s run with that example for a moment. It’s deceptively easy, with one’s overworked-developer hat on, to think that the answer is obvious: “Oh, we want to bring in others because then they’ll contribute code and bugfixes and documentation to the project, thus lowering the costs of development for everyone else.” But with one’s strategic-thinker hat on, the question starts to look more complex — its many possible, non-mutually-exclusive answers each affect the shape of the investment.

If one of the ways the open source project serves your goals is by providing a channel for closer technical cooperation with customers and potential customers, then perhaps your investment in engaging participants should emphasize fast response times in discussion and deliberate probes to uncover usage scenarios. On the other hand, if the point is to disrupt a competitor’s proprietary product in the marketplace, then it might make more sense to invest heavily in ease of deployment, including fast processing of the relevant bug fixes and documentation contributions. One thing is certain: you cannot make every investment at once. All human endeavors are resource-constrained, and software projects are certainly no exception. One does not have a choice about prioritizing; one merely has a choice about whether to do it purposefully — strategically — or accidentally.

Please do not place too much weight on that one example, in any case. While investment in new participants is an important component of open source strategy, it is not the only component. If we were to make a high-level list of possible strategic concerns, it might look like this:

  • How open source supports your mission or goal.
  • How it affects your relationship with competitors and their products.
  • How it affects your relationship with customers.
  • How it affects the internal structure of your organization.
  • How open source allows you to draw in partners; how it affects where those partners come from; how it defines your relationship with those partners and their relationships with each other. (Project governance is a subset of this, but typically is not the most important subset.)
  • What types of investments you need to make to shape the above relationships in ways that serve your goals.
  • How you sustain your open source efforts over time. (A project’s sustainability model(s) is not the same as any one of its participants’ business models. An open source project that aims to create a diverse ecosystem of lightly-involved support vendors will have a very different sustainability model from a project that supplies a key piece of infrastructure needed by a few large corporations.)
  • What you can do as an open source actor that your proprietary competitors cannot.
  • What collaborative or market opportunities does being open source enable?

These concepts are not just for executives and managers, by the way. Developers benefit from strategic awareness, and of course can help support a strategy most effectively if they know about it. Our target audience for these posts is developers who want to think strategically, as much as it is managers and organization leaders.

In order to do strategic planning around products and projects, we have found a common set of base information and exercises to be useful: explicit goal-setting; mapping the ecosystem that surrounds the project; identifying business models (before identifying sustainability models); understanding the cyclical way in which open source commoditizes product categories and what that implies for the particular product and category in question; how an open source project relates to the procurement and deployment habits of its intended audience; and making choices in the inevitable trade-off between control and reach.

We will discuss each of these in future posts in this series. The point of this post is simply to say that strategy is a thing, and that it is separate from community management, collaboration tools, and everything else that makes things run at the project level. To use open source to meet your goals, it is necessary to structure your open source engagement in ways that align with those goals — and this is fundamentally a strategic question that won’t be easily answered from within the confines of day-to-day technical development.

Thanks to Microsoft for sponsoring the Open Source At Large blog series, and thanks to Josh Gay for sending us copyedits on this post.

Announcing a New Series:
Open Source At Large

Photo credit: CHAND ALi https://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Glorious-blue-mountain-range.jpg/1024px-Glorious-blue-mountain-range.jpg

Open Tech Strategies has a dual mission. Day to day, we help our clients understand how open source approaches fit into their strategic goals, and we help them implement those approaches. But over the long term, we also try to act at the ecosystem level when possible. The more organizations invest thoughtfully in open source, the better off open source as a whole is — and the more organizations will want to try it, in a virtuous circle.

For years we’ve been digging into the details of our clients’ operations, customer bases, and markets in order to help them recognize and act on specific open source opportunities. While this work is tailored to each client, we are always looking for ways to publish what we learn so it can benefit a wider audience. Our work with Mozilla on Open Source Archetypes and with the World Bank on their investment strategy for the GeoNode project are two examples. We’ve heard from open source practitioners across the field that these materials have been helpful to them (and we’ve received useful criticism and feedback — the sincerest form of flattery). Perhaps most gratifyingly, we’ve heard from internal open source champions at organizations that are still finding their way toward deeper open source engagement, telling us that having strategy-level materials to refer to helps them make their case.

Now we have a chance to do that kind of public analysis in a more regular and focused way. Starting this week, OTS will publish a series of blog posts focused on strategic concerns in open source. The series is kindly sponsored by Microsoft, whose request to us was essentially “help organizations get better at open source” (not a direct quote, but a decent summary). They were clear about the series being independent: they did not want editorial control, and specifically did not want to be involved in any pre-approval before a post is published. It goes without saying — but we’ll say it anyway, just to be explicit — that the views we express in the series may or may not be shared by Microsoft: please blame us, not them.

We’ll focus on the kinds of analysis we do when we advise clients: how to identify opportunities, how to make decisions about prioritizing and shaping open source investments, how to integrate open source methods into one’s business models and goals, monitoring and improving open source project health, and more. Our clients will recognize some of this material — our advice tends to be consistent over time — but much of it will be ideas we have not discussed widely before. We look forward both to offering strategic analysis to newcomers to open source and to engaging our colleagues in the open source field in a wide-ranging discussion.

Our first substantive post discussing “What Is Open Source Strategic Thinking?” is up.  Watch this space for more!

Be Open From Day One, Not Day N.

Note: This is an updated version of an article I first wrote in 2011. The original site went offline for a while, and although it was later restored, thanks to heroic efforts by Philip Ashlock, I felt the article needed a new home, and wanted a chance to update it anyway. This version also incorporates some suggestions from V. David Zvenyach.

Over the years we’ve watched software projects of all sizes make the transition from closed-source to open source. The lesson we consistently draw from them is this:

If you’re running a software project and you plan to make it open source eventually, then just make it open source from the beginning of development.

Waiting will only create more work.

The longer a project is run in a closed-source mode, the harder it will be to open source later. Continue reading “Be Open From Day One, Not Day N.”