My personal and professional life

2019-02-10

First migrations from CVS to Git

Last May I had the good intention to start gradually migrating my old CVS repositories to Git, but I only managed to begin the process this weekend. Before plunging deeper I decided to migrate some smaller repositories with linear history, so I could exercise, verify and master the process. As a result some small projects already landed on my GitHub profile (see guthub.com/gdsotirov). I have read much about migrating from CVS and SVN to Mercurial and Git in relation to an internal project in the company I'm working for. I have also already experimented migrating from CVS to SVN in the past and from CVS to Hg/Git more recently, so I considered myself prepared, but still there are some things to consider when doing such migration.

Repository preparation

Revision log messages. You may have a convention for writing revision log messages in CVS, but I did not have one for my personal projects, so I have written comments on single line and on multiple lines as unnumbered list like this:

* Add this
* Modify that

I had to modify these comments in CVS, before the migration, because these simply do not look good in TortoiseGit and GitHub. I also aligned some similar comments on files committed in few minutes time span, so that the migration could group them in a single revision. Depending on whether you had a convention for the format of the revision log messages in CVS you may need to make other adjustments. The easiest way in such case may be to modify directly the revision control files with perl, sed, awk, etc. while of course paying caution not to ruin your history (i.e. such operations should not be performed on "live" production repository).

Authors. You should be able to identify all your authors, because in CVS repositories they are normally UNIX login names and the users may no longer exist in the system. I've seen even cases where the same username was reused by different developers having the same given and family names. It's hard to identify and fix such revisions unless you're aware of developers' internship in the company, but what if their time in the company overlaps?
Hopefully, I do not have this problem with my personal repositories, but still I had to rewrite my username as full name and email address (see more about this in chapter Step One: The Author Map of Eric Raymond's DVCS migration HOWTO). It's very important to rewrite the authors, because otherwise the commits won't be properly identified in GitHub (see Contributions that are counted). If you need to fix usernames it's preferable to do it in the CVS repository before the conversion. And the easiest way to map usernames to authors is to rewrite the converted Git repository as I write about below.

Tools

In such a process one of the most important things is to chose the right tools, because you simply cannot go without them. Last year at work we evaluated several options and finally choose cvs2git, which is part of cvs2svn project. The command is written in Python and supports various options. For me it is important to make the best possible migration, so I had to make the following modifications in the example options file:

KeywordHandlingPropertySetter('expanded'),

from cvs2svn_lib.keyword_expander import _KeywordExpander
# Ensure dates are expanded as YYYY/MM/DD as in CVS 1.11
_KeywordExpander.use_old_date_format()


The first line ensures that CVS keywords (e.g. $Id$, $Date$, etc.) are expanded in text files (see topic cvs2svn changes 'date' string in source codes). The following three lines ensure that dates in keywords are expanded as in CVS 1.11 or in YYYY/DD/MM format, instead of the default format YYYY-MM-DD as in CVS 1.12 and SVN.

Migration

The way to covert a CVS repository to Git is described in cvs2git's documentation (see chapter Usage), so just read it. I wrote myself a shell script to execute the different steps defined there. Initially I used cvs2git only with command line options, but because of my requirements for keywords expansion (see above), I had to improve the script to generate options file and use it instead. Like this I retained the way I call the script by just providing path to the CVS repository.

For now I haven't had problems with the migration - the history was represented properly in Git. However, with many tags and branches over different files things could get really messy as I found out with the internal project at work. After the migration I check the result and if necessary fix more revision log messages. Then, I just rewrite authors with the script given in Changing author info and finally import the repository to GitHub (e.g. I use git daemon --verbose --export-all --base-path="/path/to/repo" to serve the converted repository). After the import to GitHub I have to add LICENSE and README.md files, but this is quite trivial directly through the web interface.

And that's all. I have more repositories to migrate and I really hope I'll be able to finish the task until the end of the year, but it would all depends on how quickly I'll be able to prepare the repositories, because after this it's the jobs of the tools.