Why and How To Keep Your Master Happy

Now that I have your attention, I want to talk about git, and the importance of keeping your master branch happy in the sense of being clean, readable, and informative.

Note that "master" is the main branch in a git repository; it corresponds to what Perforce users call "mainline", and svn users call "trunk". CVS users, you can come out from under your rock now. It's 2013. (I am embarassed to admit that I discovered when I started this article that I hadn't converted the directory it's in to git yet. Time for another how-to article, right?)

I should also note that the workflow described here is not what you would use with GitHub. I'll talk about the differences toward the end.

Why?

Let's face it: you're not going to go to any extra trouble checking in your code unless you have a good reason. So let's start with

git log

Take a good look. If you're like most people, you have a lot of commits with descriptions like "oops", "fix spelling error", "add Javadoc comments", and "Steve says I should have squashed the last four commits, but I don't know how." Really helpful, isn't it?

If you're using a code review tool like Review Board (and unless you're working all by yourself, why aren't you?), how many commits are there that don't have a code review associated with them? If you're using GitHub, how many commits were in your last pull request?

Ask yourself whether you want your boss to see all those stupid little commits. How about your coworker, who's going to take over your code after you leave or lose interest in it? How about you, a year from now, when you have to go back and try to figure out where that pesky bug came from? How about your mother, who told you always to wear clean underwear because you never know when you might get hit by a bus and wind up in the ER, and who found your github commits via Google?

And do you see any merges? How many of them are headed "merge from master"? Did you notice that git asked for an explanation when you did that? Does the phrase "spaghetti code" come to mind?

 

OK, here's what you want to see in your log: a series of commit messages each of which has a nice descriptive first line saying what that commit did, a brief but clear description if it isn't blindingly obvious, and a link to a code review or bug-tracking entry.

If there are merges, they should be there because you've been doing all of your work on a feature branch -- and master should be nothing but a sequence of nice, clean merges.

How?

It all starts out with a branch:

git checkout -b mybranch origin/master

This puts you on a new branch that's tracking origin/master. Tracking does a couple of good things for you: it makes it easy to post reviews (because Review Board can immediately see where you're coming from), and it makes it easy to see (via git status) whether anything new has appeared in master.

Your first commit on the new branch should be a statement of your intent -- it should say what you want to accomplish with this branch. If your intent changes, that's ok -- you can always go back and edit it the commit message before you merge.

Commitment

Did you catch that? You can edit the commit message.

You do this with the --amend option. Instead of simply doing a commit, you say

git commit --amend

(Possibly with a -a option, if you don't want to bother adding files explicitly.) You will now be in the editor, editing your latest commit message. In most cases you can just exit, and your commit will be updated. You can also amend without any code changes, if you just want to edit the commit message, e.g. to add a link to the code review.

Some people prefer to keep on making tiny little commits and squashing them just before they push, using either "git merge squash" or "git rebase -i". The problem with this is that sometimes they forget. It's useful, though, if you're working on a branch that you want other people to share -- you mustn't amend a branch that's been published.

Keeping up with the changes

Assuming you're not the only one working on your project, sooner or later somebody is going to push some changes to master while you're working on a branch. Now it's time to rebase. Rebase simply takes all of the work you've done on your branch (in the form of diffs, of course) and re-plays it on top of some other branch.

The easy way to do this, assuming you've set up your branch to track origin/master, is

git pull --rebase

The other way, which my fingers learned before I ever heard of tracking branches, is this:

git checkout master
git pull
git checkout mybranch
git rebase master

This has the advantage of keeping master up to date, as well as your working branch. This can be useful if you want to do diffs (to see what's changed), or start working on another branch. It also keeps you from accidentally merging on top of a master that isn't up to date.

Rebase is not without its hazards: Sometimes you'll get merge conflicts, especially if one or the other of you has done some refactoring or reformatting that affects the files you're working on. That said, it usually works better than merging.

If you're always going to use git pull --rebase on branches, you may as well make that the default:

git config --global branch.autosetuprebase always

Contrariwise, if you're publishing your branch and intend to squash it when you merge back onto master, it's ok to use git pull and make merge commits. You'll squash them later -- just remember to do it.

Putting it back together

So... you've made all your changes, and they've passed all their unit tests (building from the command line, not Eclipse), and you've posted your code review, dealt with all the comments, and gotten a "ship it" or equivalent. Good. Now rebase from origin/master and run all the tests again. Fix anything that broke.

Now it's time to merge back to master.

There are two schools of thought on how this should be done. Most people want to see a nice, clean sequence of ordinary commits on master, each one corresponding to a code review. Otherwise known as "what you did on your branch, all squashed down into one commit".

That's easy. If you've been following instructions and using --amend, you only have one commit on your branch.

All you have to do is:

git checkout master
git pull
git merge --ff-only mybranch
git push

Don't forget the pull! That's doing two things: making sure that nobody snuck in with another commit on master while you weren't looking, and making sure that master is in sync with origin/master in case you've been using git pull --rebase.

The --ff-only makes sure that you really did rebase your branch, and it's all up to date.

If your branch does have multiple commits on it, replace the merge with

git rebase -i mybranch

... and see below under squashing.

 

Some groups like to see a real merge commit on master. That's easy, too: just replace "--ff-only" with "--no-ff"! That tells git not to do a fast-forward even if it looks like one, and make an explicit merge commit.

Groups with a really formal process will want you to push your branch and have someone else merge it with a git pull command. That's what the GitHub process expects -- it's the pure git version of a "ship it" in Review Board.

In that case, you probably don't want your branch tracking origin/master, because you want to be able to push it to GitHub or wherever as a branch.

Oops.

Mistakes happen. It's probably a good idea, especially at first, to run "gitk --all" before you push. You either get to admire your nice clean commit, or see something that needs fixing.

In that case, no harm, no foul:

git reset --hard origin/master

and you're back to where you were before your merge.

This, by the way, is why you want to make sure everything is committed before you merge or pull -- it makes it a lot easier to recover if something goes horribly wrong.

Keeping It Clean

Here, in no particular order, are some additional tips:

re{formatting, factoring}

Reformatting and refactoring are similar in that they both make a lot of changes that, ideally, have no effect on whether or not your code works.

Reformatting makes dozens, perhaps hundreds, of changes in whitespace, indentation, and line breaks. Refactoring operations, for example, renaming a class, make lots of little changes to dozens of different files.

Both make it extremely hard to merge changes that were made before the operation was done. If there are enough changes to a file, there are almost certain to be conflicts. And in some cases, whole blocks of code changes can get rewritten or deleted in a merge with a reformatted file.

So here's how you do it:

  1. Send email to your team, telling them what you intend to do and what files you're going to be touching. (Ironically, this is easier in a big project, where everyone is working on a different part of the tree.)
  2. Wait until everybody has merged their changes to the files you're going to change.
  3. Make your changes as quickly as possible, run all the tests, get your code review, and merge them in.

Squashing

Every so often you find yourself having to make more than one commit on the same branch. It sometimes happens to me if I'm working on two different machines, or if there are more than one person collaborating on the same branch. It's almost impossible to avoid if you're in the habit of committing files from your editor. Here's where squashing comes in handy.

The easy way, if you just want to get it over with, is

git merge --squash mybranch
git commit

This makes all the same changes as merging your branch, but doesn't make a merge commit. When you actually do the commit, on the next line, you get to edit the automatically-generated commit message to something more sensible than "Squashed commit of the following:", followed by all the commit messages on your branch. Hopefully, you can keep the first one, and maybe turn a few of the following ones into bullet points.

You can have more control over which commits to keep and which to squash with the following, which squashes your branch on top of origin/master.

git checkout master
git rebase --interactive --autosquash mybranch

Sorry, wrong number branch

It happens to everyone. You're hacking along contentedly when suddenly, "Oh fiddlesticks! I forgot to start a new branch!"

No problem. Just make your branch, and the changes come along for free. Even if you committed them -- you can simply make your branch, then go back to master (or wherever) and reset it. (See above under "Oops".)


Steve Savitzky <steve @ savitzky.net>