DVCS baby steps
Thoughts on Git and Mercurial
Over the past few days, I've finally taken the time to learn a bit more about Distributed Version Control Systems (DVCSs). In particular, I've read both The Mercurial Book and Pro Git, as well as a few blogs and tutorials.
Most of the time, I work with Subversion, both in development projects at work, and when I contribute on Plone. Subversion is a VCS, but not a DVCS. There is a single, centralised repository, from which everyone checks out a local working copy of a given project. Subversion tracks any number of files, co-ordinating updates by multiple authors, and allows you to roll back to previous versions should you need to. It deals with merges, branches and conflicts, too, although not in quite as sophisticated a way as a DVCS.
DVCSs are different. Typically, there is no central repository. Everyone has a full copy of the repository, including the full revision history. Most commands are local, and changes need to be explicitly pushed to a remote server. Branches are a lot cheaper, and merges are easier. A DVCS supports a number of workflows (which can be mixed on an ad-hoc basis). You can have a single maintainer of the canonical version of a given project "pulling" from others' published repositories, or a model whereby people submit patches exclusively by email. There doesn't even need to be a canonical version at all, just a number of people with their own sandboxes, sharing code in a completely decentralised manner.
Mercurial and Git are both very powerful systems. Of the two, I prefer Mercurial, mainly because it seems a little easier to work with. Git is probably a bit more powerful (I like the "stash" feature in particular), but whilst the basic commands are relatively straightforward, I found myself worried that I'd need to read the documentation many more times before it became comfortable. Some features, like submodules, seem downright confusing, and there are some niggling inconsistencies in the way the command line interface works. Of course, it helps me that the Mercurial commands closely mimic those of Subversion.
But would I switch to them wholesale? Perhaps, but not yet. The DVCS model, and the enhanced support for branching and merging, is certainly attractive. However, I worry that the fundamental DVCS concepts are just a little too complicated. I normally have to teach version control basics (with Subversion) to the people who join my project teams. It's not always easy for people to grasp, and harder still to get into good working habits. Subversion at least simplifies the process, in that it can be explained by the analogy of a filesystem with history. There is very little "magic" going on. For example, a "tag" is just a copy of the code in the "/tags" directory, nothing more, nothing less. You can understand how you'd do it all manually if you had to.
By contrast, I think it would be difficult to explain a DVCS without reference to the underlying data model. Both of the books I read made heavy use of diagrams and examples to illustrate even pretty basic concepts, and felt it necessary to explain the repository data structures in some detail. You need to build a mental model that includes a commit history with branches and merges. You need to understand concepts like "heads", "tips" and "parents". The history is not even static - it can be changed, for example by "rebasing". If you don't have a degree in computer science or at least a good understanding of with such concepts as abstract data structures (e.g. linked lists) and pointers, as well as familiarity with the UNIX tool chain for managing diffs and patches, I think you're going to struggle to pick up a DVCS in an hour or two (the time I normally have to get people started). And if you don't understand the DVCS you're using well enough, I fear that the experience could become very frustrating. I don't get the feeling that either system is particularly conducive to trial-and-error. As soon as people start asking, "what the heck happened to my file?", I think you've lost.
This is a classic "power vs. simplicity" tradeoff. The very concepts of decentralised version control, branching, and merging are complex. There are a lot of edge cases. There are a lot of real-world use cases that require sophisticated solutions, especially when you start getting into projects that have a large number of contributors or a high volume of contributions.
The good news is that even if you're working with a project that uses Subversion for its repository, you can use both Git (via git-svn) and Mercurial (via hg-subversion) as a "super-client" for Subversion. You can have local commits and use the various DVCS tools, and then "push" your changes up to the Subversion server, which won't know the difference between this and any other client. I intend to start trying that out with Mercurial and hg-subversion shortly.
The next time I start a new/non-Plone project, I may also try to use Mercurial (or Git) only. And if I do, I'll almost certainly use BitBucket to do it. BitBucket (and its Git equivalent, GitHub) is a free (up to a point, but you can buy plans with more data allowance) service that enables anyone to set up Mercurial repositories for any number of projects.
The interesting thing, though, is that BitBucket and GitHib are a little bit like Facebook for programmers. Whereas traditional "project hosting" services such as Google Code (which supports both Mercurial and Subversion) is centred around the project, BitBucket and GitHub are centred around the people. That is, they both host repositories on URLs like http://<service>/<person>/<project>. There is no single location for a given project. Instead, you are encouraged to create your own fork of someone else's repository (there's even a button on the web GUI) if you want to work on that project, and then either request access to push your work back yourself, or ask the maintainer to pull it into their working copy.
I don't think this is the correct model for all projects. I suspect it is best suited for small, ad-hoc projects, personal projects, or projects that are not yet off the ground. In fact, I think this model could be actively damaging for a large, co-ordinated project such as Plone. Of course, you don't have to use this completely decentralised model, so that is not an argument against Git or Mercurial in itself.
I would say, though, to those working on Plone projects: please continue to use the shared infrastructure we already have, until such time that we are all ready to move to something different, together. Although it is not quite the same, working with a Subversion repository via Mercurial or Git is pretty powerful and gives you most of the benefits of having a fully DVCS-managed environment. By putting things in the places where other people are most likely to find them, and allowing those not ready to make the switch to work with the tool chain with which they are already familiar, you will encourage contributions and aid the cohesiveness of the community. When it comes to "core" Plone code, there is also a legal aspect to consider, in that the Plone Foundation owns code in the Plone repository, and that this code is protected by the Plone Contributor Agreement. That is an important factor in protecting Plone's intellectual property, and not something to be taken lightly.
Do I think Plone will eventually move to using Mercurial or Git? It seems a bit unlikely, given the amount of code we have in our Subversion repositories. I'm also worried that some of our integrators would find the Mercurial or Git learning curve a little bit too steep. We need to keep the barriers to contribution as low as possible. However, there clearly would be benefits too, so we should have a debate and see what the community wants. It seems a lot of other projects are moving to a DVCS, which is a sign that we should look into it, too. And in the meantime, I would encourage everyone to try out Mercurial or Git, or both.