<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Open Tech Strategies | Blog - phab2lab</title><link href="https://blog.opentechstrategies.com/" rel="alternate"></link><link href="https://blog.opentechstrategies.com/feeds/phab2lab.tag.atom.xml" rel="self"></link><id>https://blog.opentechstrategies.com/</id><updated>2023-01-10T21:52:00-05:00</updated><subtitle>Maximum return from your open source investments.</subtitle><entry><title>Moving Repositories Between Project Hosting Platforms</title><link href="https://blog.opentechstrategies.com/2023/01/moving-repositories-between-project-hosting-platforms/" rel="alternate"></link><published>2023-01-10T21:52:00-05:00</published><updated>2023-01-10T21:52:00-05:00</updated><author><name>James Vasile</name></author><id>tag:blog.opentechstrategies.com,2023-01-10:/2023/01/moving-repositories-between-project-hosting-platforms/</id><summary type="html">&lt;p&gt;No matter how tightly developers are committed to their
current project hosting provider (GitHub, GitLab, GNU Savannah, or
whatever), new ones will come along over time. The history of web
services is replete with turnover, and project hosting forges all
follow the inevitable trend. But the cost of migration is …&lt;/p&gt;</summary><content type="html">&lt;p&gt;No matter how tightly developers are committed to their
current project hosting provider (GitHub, GitLab, GNU Savannah, or
whatever), new ones will come along over time. The history of web
services is replete with turnover, and project hosting forges all
follow the inevitable trend. But the cost of migration is
formidable: It's quite easy to setup a new project host like
GitLab, but how do you move the whole structure of your team’s
code, branches, comments, issues, and merge requests into their new
home?&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://softwareheritage.org"&gt;Software Heritage&lt;/a&gt;, a non-profit with the mission of archiving free software code, faced this daunting challenge when they decided to move from Phabricator to the more vibrant GitLab. For a while, a lot of free and open source projects had found Phabricator appealing, but the forge had been gradually declining and officially ceased development in 2021.&lt;/p&gt;
&lt;p&gt;At OTS, we developed an open source tool and framework to
support migrating to a new project hosting platform. We used it to
move all of Software Heritage's projects from Phabricator to
GitLab, but the framework is robust enough to support migrations
between almost any project hosts.&lt;/p&gt;
&lt;p&gt;The tool is called &lt;a class="reference external" href="https://code.librehq.com/ots/forgerie"&gt;Forgerie&lt;/a&gt;. Its goal is to automate the migration of projects from one hosting system to
another. Forgerie is extendable to any source and destination. It
translates input from a project hosting platform into a
richly-featured internal format, then exports from that format to the
destination platform.&lt;/p&gt;
&lt;div class="wp-image-616 figure"&gt;
&lt;img alt="Diagram showing repo moving with intermediate representation" src="/uploads/2023/01/Diagram-with-IR.png" /&gt;
&lt;/div&gt;
&lt;p&gt;This is the same method used by many tools that perform n-to-n migrations. For instance, the health care field contains many incompatible electronic record systems, so migration tools usually create an intermediate format to cut down on the number of necessary format conversions.&lt;/p&gt;
&lt;p&gt;OTS continues to work on Forgerie as part of its offering of
migration services to clients. If you would like to use Forgerie,
please grab it
from &lt;a class="reference external" href="https://code.librehq.com/ots/forgerie"&gt;Forgerie's GitLab page&lt;/a&gt; or &lt;a class="reference external" href="https://opentechstrategies.com/#contact"&gt;contact us&lt;/a&gt; if you would like help with a migration.&lt;/p&gt;
&lt;p&gt;The rest of this post offers some technical background on
Forgerie. It should be of interest to anybody solving similar
project hosting problems or, more generally, to anybody working on
moving structured data into a new data store. Many migration
projects fall into the traditional category of Extract, Transform,
Load (ETL), but the richness of data stores today stretches the
category into new realms.&lt;/p&gt;
&lt;div class="section" id="forgerie"&gt;
&lt;h2&gt;Forgerie&lt;/h2&gt;
&lt;p&gt;The Forgerie code was initiated by OTS developer Frank Duncan and
released under the GNU Affero General Public License v3.0. This post
delves into the project goals along
with suggestions for the future of this project. We'll look at the
difficulties posed by this major migration project and how we
handled them. This story may offer lessons and tips to people
dealing with all kinds of data migration.&lt;/p&gt;
&lt;p&gt;If you have used a project hosting system, you might well be imagining
the massive requirements for even such a limited
project. Code in a forge exists in many branches, each created by
multiple commits and enhanced by merges. Numerous issues (change
requests) have been posted by different users, along with comments
that refer to the issues by number. Commit messages also link and refer to the
numbers of issues and branches.&lt;/p&gt;
&lt;div class="wp-image-607 figure"&gt;
&lt;img alt="Diagram showing repo moving without intermediate representation" src="/uploads/2023/01/Diagram-without-IR.png" style="width: 676px; height: 692px;" /&gt;
&lt;/div&gt;
&lt;div class="section" id="the-need-for-a-general-project-hosting-migration-tool"&gt;
&lt;h3&gt;The need for a general project hosting migration tool&lt;/h3&gt;
&lt;p&gt;Tools for importing projects exist for various project hosting
platforms, but they are limited. GitLab does a pretty good job
importing a repository from GitHub, and GitHub from GitLab, and both
allow the uploading of a private repository. Later in this article
we’ll examine one particular limitation of all these import tools:
handling multiple contributors.&lt;/p&gt;
&lt;p&gt;To automate the migration from Phabricator to GitLab, Software
Heritage contracted with Open Tech Strategies (OTS), a free and open
source software consulting firm. Preliminary research turned up a few
tools claiming to perform the migration, but none of them did a
complete job. And each migration tool works only with one particular
forge as input and another as destination. OTS decided to design its
new tool as a general converter that could be adapted to any source
and target repositories.&lt;/p&gt;
&lt;p&gt;Migration thus requires the automated tool to reproduce, on the
target forge, all the projects, branches, commits, merge requests,
merges issues, comments, and users recorded in the source
repository. If possible, contributors should be associated with their
contributions.&lt;/p&gt;
&lt;p&gt;OTS chose to create Forgerie in Common Lisp, which seems like an
odd choice in the 2020s. But Common Lisp is well-maintained and
robust. Its big advantage for the Forgerie project was that Lisp makes
database-to-dictionary conversions easy. Because Phabricator stores
data in a relational database, database-to-dictionary conversions were
the central task in automating the migration.&lt;/p&gt;
&lt;p&gt;The Forgerie project has three subdirectories: a set of core files used
by all migrations, egress files for Phabricator, and ingress files for
GitLab. This design leaves room for future developers to extend the
project by adding more ingress and egress options. In order to go from Phabricator to GitHub, for instance, a
maintainer can reuse the existing core and Phabricator
directories.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="impedance-mismatches-create-challenges"&gt;
&lt;h3&gt;Impedance mismatches create challenges&lt;/h3&gt;
&lt;p&gt;All forges offer basic version control features, along with
communication and management tools such as issues. But each forge is
also unique. In this case, Duncan had to decide how best toaccommodate
features that differ or are missing in the target GitLab platform.&lt;/p&gt;
&lt;p&gt;The biggest challenge Duncan faced is that GitLab maps projects to
repositories on a one-to-one basis, whereas Phabricator treats a
project as a higher-order concept.
A project in Phabricator can
contain multiple repositories, and a repository can be part of many
projects. Phabricator also supports multiple version control tools
(Git, Mercurial, etc.). Making Forgerie flexible enough to smooth over these types of differences in data structure was a key goal.&lt;/p&gt;
&lt;p&gt;The different approaches to projects introduced several
complications. First, Duncan had to make sure that each message and
ticket pointed to the right GitLab project.&lt;/p&gt;
&lt;p&gt;Merge requests were the hardest elements to migrate, because in
Phabricator a changeset can span multiple repositories. The
requirement that Duncan had to implement was to preserve the sequence
of events in the original forge strictly, so that issue 43 in the old
forge remains issue 43 in the new forge. That way, any email message
or comment referring to the issue still refers to the right one.&lt;/p&gt;
&lt;p&gt;Lots of details had to be tidied up. For instance, Phabricator has
its own markup language to add rich text to comments and issues. This
language had to be converted to Markdown to store the comments in
GitLab.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-question-of-multiple-contributors"&gt;
&lt;h3&gt;The question of multiple contributors&lt;/h3&gt;
&lt;p&gt;When there are many people to credit for their contributions, the
import tool has a tough nut to crack. Awarding credit properly is
crucial because many contributors rest their reputations on the record
provided by their contributions. Statistics about the number of
commits they made, the “stars” they got, etc. undergird their
strategies for employment and promotions. Losing that information
would also hurt the project by making it hard to trace changes back to
the responsible person.&lt;/p&gt;
&lt;p&gt;On the other hand, security concerns preclude allowing someone to import
material and attribute it to somebody else.&lt;/p&gt;
&lt;p&gt;GitLab solves this problem if the input repository is set up
right: The person doing the import needs master or admin access and
has to map contributors from the input repository to the destination
respository. If access rights don't allow the import to add material
to a contributor's repository, GitLab's import can accurately
attribute issues to the contributor, but not commits.&lt;/p&gt;
&lt;p&gt;Forgerie goes farther in preserving the provenance of contributors:
It keeps track of Phabricator users and creates a user in GitLab for
each user recorded in the Phabricator repository. The Software
Heritage project did not present difficulties because no contributor
had an account in GitLab. To be precise, the email address that
identified each Phabrictor contributor didn’t already exist for any
GitLab contributor. If GitLab had an account with the same email
address as an account being imported, the system would have issued an
error and prevented Forgerie from importing the contributor’s
commits.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-few-implementation-details"&gt;
&lt;h3&gt;A few implementation details&lt;/h3&gt;
&lt;p&gt;Forgerie carries out a migration by creating a log of everything
that happened in the source repository, and replaying the log in the
target forge. Phabricator uses a classic LAMP stack, storing all
repository information into a MySQL database. Forgerie queries this
database to retrieve each item in order, then invokes the GitLab API
to create the item there.&lt;/p&gt;
&lt;p&gt;The GitLab API is relatively slow for those particular types of
request, requiring one or two seconds for each request, and
repositories can contain tens of thousands of items when you count all
the merges, comments, etc. So you can expect a migration to take 24
hours or more.&lt;/p&gt;
&lt;p&gt;Long runs call for checkpoints and restarts. When Duncan designed
the simple version of Forgerie for him to run just once, he figured he
could just restart the run if it failed. Later he realized that
restarting after 23 hours became unacceptable.&lt;/p&gt;
&lt;p&gt;The log solves this problem through a kind of simple
transaction. You can conceive of the migration as moving through three
stages (Figure 1). In the first stage, items are in the old platform
but not the log. In the second stage, Forgerie adds the items to the
log. In the third stage, items are safely loaded into the destination
platform and can be removed from the log. Should the job fail, the
user can restart it from the beginning of the log.&lt;/p&gt;
&lt;div class="wp-image-602 figure"&gt;
&lt;img alt="Figure 1: Logging items as they move from source to destination platform." src="/uploads/2023/01/Diagram-with-Logs.png" /&gt;
&lt;p class="caption"&gt;Figure 1: Logging items as they move from source to destination platform.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A classic issue with transactions arises with a log: Suppose an
item has just entered the target forge but Forgerie did not have a
chance to remove the item from the log before a failure. The item
exists in both the target repository and the log, so when Forgerie
starts up again, the item will be added a second time to
the repository. Forgerie developers do not have to worry about
this happening because the insertions are idempotent. The second
insertion overwrites the first with no corruption of
information.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="assessing-the-forgerie-project"&gt;
&lt;h3&gt;Assessing the Forgerie project&lt;/h3&gt;
&lt;p&gt;The Forgerie code base is surprisingly small–a total of 2,726 lines, divided as follows:&lt;/p&gt;
&lt;div class="aligncenter docutils container"&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Core (shared) code: 350 lines&lt;/li&gt;
&lt;li&gt;Phabricator-specific code: 1,233 lines&lt;/li&gt;
&lt;li&gt;GitLab-specific code: 1,143 lines&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;No platform lives forever. Amazing as the capabilities of GitHub and
GitLab are—and they continue to evolve—there will come a time when developers decide they have to pick up and move their code to some glorious new way of working. Forgerie tries to make migration as painless as possible.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="https://praxagora.com/"&gt;Andy Oram&lt;/a&gt; for assistance drafting this post, to Jim McGowan for making the diagrams, and to Antoine R. Dumont of Software Heritage for contributing &lt;a class="reference external" href="https://code.librehq.com/ots/forgerie/-/merge_requests/1"&gt;technical improvements&lt;/a&gt; to the Forgerie project.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="Posts"></category><category term="phab2lab"></category></entry><entry><title>Need help migrating off Phabricator?</title><link href="https://blog.opentechstrategies.com/2022/02/need-help-migrating-off-phabricator/" rel="alternate"></link><published>2022-02-17T14:09:00-05:00</published><updated>2022-02-17T14:09:00-05:00</updated><author><name>James Vasile</name></author><id>tag:blog.opentechstrategies.com,2022-02-17:/2022/02/need-help-migrating-off-phabricator/</id><summary type="html">&lt;p&gt;Open Tech Strategies can help you migrate off of Phabricator now that it has reached &lt;a class="reference external" href="https://github.com/phacility/phabricator"&gt;end-of-life&lt;/a&gt;. We developed &lt;a class="reference external" href="https://code.librehq.com/ots/forgerie"&gt;Forgerie&lt;/a&gt;, an open source tool that aids in migration between code forges. Forgerie extracts data from your Phabricator instance and injects it into a GitLab instance. It can also help move repositories …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Open Tech Strategies can help you migrate off of Phabricator now that it has reached &lt;a class="reference external" href="https://github.com/phacility/phabricator"&gt;end-of-life&lt;/a&gt;. We developed &lt;a class="reference external" href="https://code.librehq.com/ots/forgerie"&gt;Forgerie&lt;/a&gt;, an open source tool that aids in migration between code forges. Forgerie extracts data from your Phabricator instance and injects it into a GitLab instance. It can also help move repositories to a GitHub account.&lt;/p&gt;
&lt;p&gt;Using Forgerie is a process. It requires setting parameters, running the tool, examining the results, tweaking the parameters, and re-running until the result meets your needs. Our team can help you with this process. We can move you to your own GitLab instance, host an instance for you, or get you migrated to &lt;a class="reference external" href="gitlab.com"&gt;GitLab.com&lt;/a&gt; or &lt;a class="reference external" href="https://github.com"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We'd love to help you transition from Phabricator. Drop us a line
at &lt;a class="reference external" href="mailto:info&amp;#64;opentechstrategies.com"&gt;info AT opentechstrategies.com&lt;/a&gt; and we'll get you
safely to your new home.&lt;/p&gt;
</content><category term="Services"></category><category term="phab2lab"></category></entry></feed>