Git: synchronizing bare repos

Title:

Git: synchronizing bare repos

Author:

Douglas O’Leary <dkoleary@olearycomputers.com>

Description:

How to keep 2 bare repos in sync (DR)

Disclaimer:

Standard: Use the information that follows at your own risk. If you screw up a system, don’t blame it on me…

At long last, I’m discovering the wonderful tool that is git. I’ve long been an avid user of RCS accepting the limitations in favor of having a version control system for UNIX OS and configuration files. I got introduced to git when a client asked me to look into it so they could track changes to web configuration files. I have to say, I’m quite impressed.

While working through the various issues - more related to my understanding of a distributed version control system than w/git - I ran face first into the concept of bare repositories.

Let’s see if I can paraphrase: Normal repos have the magic directory (${repo}/.git) and the work tree. If someone pushes an update to a normal repo, the magic directory would be out of sync with the work tree. Git prevents that from happening. So, if you want to be able to update a repo remotely, there are two choices:

  • Branch out of master (thereby moving the work tree to a different branch)

  • Use a bare repo.

Bare repos just contain the information in the magic directory. The data and files that the repo is tracking aren’t directly available in the directory structure under a bare repo. They have to be in there somewhere, obviously, or you wouldn’t be able to get them out; however, they’re stored in an internal compressed format. As a slight aside: that’s a hell of a compression ratio, too. The normal repo for the web configuration files noted above is ~ 3.18 gigs. The corresponding bare repo: 196K.

A more eloquent discussion along with a demo is available in the gitolite documentation.

Bare repos don’t have work trees so there’s no problem with accepting pushes from remote clients. But, normal update commands like git fetch won’t work with the default configuration. So, what happens if you have a bare repo that needs to be kept in sync with another one? Think producton -> Disaster Recovery.

I searched for awhile on this topic and was thinking that wonderfully complete and complex tool had a hole in it. So, almost as a last resort, I asked a question on google groups. Konstantin Khomoutov’s answer is an awesome amount of work, astoundingly complete, and accurate. The short version, though, is to update the refspec that the fetch command uses. You do this by running two commands:

git remote add origin ${remote_host}:${repo}
git config --add remote.origin.fetch '+refs/*:refs/*'

Those commands update the bare repo’s config file creating a stanza like:

[remote "origin"]
        url = ${remote_host}:${repo}
        fetch = +refs/*:refs/*

Once done, git fetch -v commands work as advertised.

$ git fetch -v
From ${remote_host}:${repo}
 = [up to date]      master     -> master
 = [up to date]      dr/master  -> dr/master
 = [up to date]      origin/master -> origin/master

From there, it’s a simple cron script on the DR host to update the repo as often as you’d like. That’s left as an exercise for the reader…