Transplanting Subtrees With git

If you have a lot of projects in separate git repositories, you're eventually going to want to do some refactoring and put some a subtree or two into their own repository so that you can reuse them. (If everything is under the same repository, you probably moved your tree from Subversion or CVS, and you'll want to split it up. These techniques work for that, too.)

Transplanting Subtrees

Often you want to move one or more subtrees to a differemt project (usually but not always a new one), but keep them in the same relative position. This can happen if, for example, you want to take a Java package out of a project and make it its own library. If you organize your code sensibly, you'll have source code and tests in separate subtrees, and some build logic in the root. This is a job for git subtree.

This is the one case where you don't have to clone your origial repository unless you want to, but it's a good idea anyway because it's going to end up with an extra branch in it for each subtree you're moving. And even if you plan on nuking it later, cloning it gives you something to fall back on when a cat walks across your keyboard (as mine has done several times since I started writing this).

Let's start by making our new branches:

      cd ../MyBloatedProject
      git subtree split --prefix=src/com/example/separable/package --branch=src
      git subtree split --prefix=tst/com/example/separable/package --branch=tst
      cd ../ShinyNewLibrary
      git checkout -b working
      git remote add mbp ../MyBloatedProject
      git subtree add --prefix=src/com/example/separable/package mbp src
      git subtree add --prefix=tst/com/example/separable/package mbp tst
      git merge src tst

...and you're done. Post your code review.

The nice thing about this process is that it doesn't matter if people are still working in the subtrees you just moved. That's because every time you run git subtree with the same arguments you get the same set of commits. (That's also why you don't want to rebase the commits you moved.)

So suppose someone has gone and committed some changes to the subtrees you're working in. True story -- it happens a lot on large teams. Or if you have a habit of working on two computers! Now you can take advantage of the feature mentioned above.

    # update the split branches:
      cd ../MyBloatedProject
      git checkout master
      git pull
      git subtree split --prefix=src/com/example/separable/package --branch=src
      git subtree split --prefix=tst/com/example/separable/package --branch=tst
    # Merge the splits, preserving the top commit.
      cd ../ShinyNewLibrary
      git checkout -b splits    # only has to be done once.  Next time
      git reset --hard HEAD^    # just do git checkout splits.
      git subtree pull --prefix=src/com/example/separable/package mbp src
      git subtree pull --prefix=tst/com/example/separable/package mbp tst
      git checkout working
      git rebase splits

The resulting tree looks a little odd, with two extra branches that don't descend from a commit in mainline, but you can continue this way as long as anyone is still working in the original repo. Of course, once you're done, just

 
     git checkout working
     git rebase master
     git merge working
     cd ../MyBloatedProject
     git rm -r src/com/example/separable/package
     git rm -r tst/com/example/separable/package
     git commit

This, of course, leaves your package's history in two places: the old project, and the new one. Cleaning up the old repo is rarely necessary, but we'll see later how you can do that. Rewriting history in the new project to get all the commits into master in chronological order is left as an exercise for the reader.

Uprooting a Subtree

The next simplest case is when you have a subtree -- a library, or a project in a former svn tree, and you want to make it a separate project with its own git repo.

The first thing you have to do is clone your original repo, because it's going to be totally destroyed. Then,

     git filter-branch --subdirectory-filter Foo -- --all

This will wipe out your original repo, leaving only the former contents of Foo. It also wipes out any history Foo's contents might have had if they were moved from someplace else. We'll get to that.

A variant of this is where you wantall of your repository to end up in the subdirectory, in other words, to make your repo into a self-contained subtree that you can then move into some other project (perhaps using git subtree, as in the previous section.

    git filter-branch --prune-empty --tree-filter '
	if [[ ! -e foo/bar ]]; then
	    mkdir -p foo/bar
	    git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files foo/bar
	fi'

(I got that one from this article on StackOverflow, by the way.)

Of course, this doesn't work if you want some of the files to remain where they were.

Many Leaves Can Form a Forest

Other times, you're not starting out with a tree, just a collection of files, all at the same level. And maybe you want them to end up in more than one subtree. I did this most recently when I wanted to split up my songbook into my songs, public domain songs, other peoples' songs, and work in progress.

I started out by moving all the songs that weren't mine into three subdirectories called PD, WIP, and Other. Then, I made a little script:

    #!/bin/bash
    pd=$(echo `ls PD`)
    wip=$(echo `ls WIP`)
    oth=$(echo `ls Other`)
    git filter-branch --prune-empty --tree-filter \
	"for f in $pd;  do if [ -f \$f ]; then mkdir -p PD; mv -f \$f PD; fi done;\
	 for f in $wip; do if [ -f \$f ]; then mkdir -p WIP; mv -f \$f WIP; fi done;\
	 for f in $oth; do if [ -f \$f ]; then mkdir -p Other; mv -f \$f Other; fi done;\
	"

(I later made a more general Perl script -- it turned out that Bash wasn't up to doing the necessary variable substitution.)

Now I had a directory with three subtrees, and my own lyrics in the top level. Then it was a simple matter of cloning this directory three times, and running git filter-branch --subdirectory-filter in each clone.

Finally, I went back to the directory that still had the subdirectories in it. Just removing them won't do, because their contents would still be in the history. But we can fix that, with:

    git filter-branch --prune-empty --original refs/before-removing-subdirs\
    		      --tree-filter 'rm -rf Other PD WIP'
    git reset --hard HEAD; git gc --aggressive; git prune
    # and, finally, stomp on your shared repo with
    git push -f

You need the --original in there because you probably still have refs/original left over from the previous work -- filter-branch won't run if it might overwrite your original. Just in case you forgot to clone. You'll probably want to run some unit tests before that last push.

Whew! The whole process is made feasible by the fact that cloning a local repository is almost instantaneous because it uses hard links. Whenever you make a mistake, you can just clone the original repo and start over.


Stephen R. Savitzky <steve @ savitzky.net>