Now that I have your attention, I want to talk about git
, and
the importance of keeping your master branch happy in the sense of being
clean, readable, and informative.
Note that "master" is the main branch in a git repository; it corresponds to what Perforce users call "mainline", and svn users call "trunk". CVS users, you can come out from under your rock now. It's 2013. (I am embarassed to admit that I discovered when I started this article that I hadn't converted the directory it's in to git yet. Time for another how-to article, right?)
I should also note that the workflow described here is not what you would use with GitHub. I'll talk about the differences toward the end.
Let's face it: you're not going to go to any extra trouble checking in your code unless you have a good reason. So let's start with
git log
Take a good look. If you're like most people, you have a lot of commits with descriptions like "oops", "fix spelling error", "add Javadoc comments", and "Steve says I should have squashed the last four commits, but I don't know how." Really helpful, isn't it?
If you're using a code review tool like Review Board (and unless you're working all by yourself, why aren't you?), how many commits are there that don't have a code review associated with them? If you're using GitHub, how many commits were in your last pull request?
Ask yourself whether you want your boss to see all those stupid little commits. How about your coworker, who's going to take over your code after you leave or lose interest in it? How about you, a year from now, when you have to go back and try to figure out where that pesky bug came from? How about your mother, who told you always to wear clean underwear because you never know when you might get hit by a bus and wind up in the ER, and who found your github commits via Google?
And do you see any merges? How many of them are headed "merge from
master"? Did you notice that git
asked for an explanation
when you did that? Does the phrase "spaghetti code" come to mind?
OK, here's what you want to see in your log: a series of commit messages each of which has a nice descriptive first line saying what that commit did, a brief but clear description if it isn't blindingly obvious, and a link to a code review or bug-tracking entry.
If there are merges, they should be there because you've been doing all of your work on a feature branch -- and master should be nothing but a sequence of nice, clean merges.
git checkout -b mybranch origin/master
This puts you on a new branch that's tracking origin/master
.
Tracking does a couple of good things for you: it makes it easy to post
reviews (because Review Board can immediately see where you're coming
from), and it makes it easy to see (via git status
)
whether anything new has appeared in master.
Your first commit on the new branch should be a statement of your intent -- it should say what you want to accomplish with this branch. If your intent changes, that's ok -- you can always go back and edit it the commit message before you merge.
Did you catch that? You can edit the commit message.
You do this with the --amend
option. Instead of simply doing
a commit, you say
git commit --amend
(Possibly with a -a
option, if you don't want to bother
adding files explicitly.) You will now be in the editor, editing your
latest commit message. In most cases you can just exit, and your
commit will be updated. You can also amend without any code changes, if
you just want to edit the commit message, e.g. to add a link to the code
review.
Some people prefer to keep on making tiny little commits and squashing them just before they push, using either "git merge squash" or "git rebase -i". The problem with this is that sometimes they forget. It's useful, though, if you're working on a branch that you want other people to share -- you mustn't amend a branch that's been published.
Assuming you're not the only one working on your project, sooner or later somebody is going to push some changes to master while you're working on a branch. Now it's time to rebase. Rebase simply takes all of the work you've done on your branch (in the form of diffs, of course) and re-plays it on top of some other branch.
The easy way to do this, assuming you've set up your branch to track origin/master, is
git pull --rebase
The other way, which my fingers learned before I ever heard of tracking branches, is this:
git checkout master git pull git checkout mybranch git rebase master
This has the advantage of keeping master up to date, as well as your working branch. This can be useful if you want to do diffs (to see what's changed), or start working on another branch. It also keeps you from accidentally merging on top of a master that isn't up to date.
Rebase is not without its hazards: Sometimes you'll get merge conflicts, especially if one or the other of you has done some refactoring or reformatting that affects the files you're working on. That said, it usually works better than merging.
If you're always going to use
git pull --rebase
on branches, you may as well make
that the default:
git config --global branch.autosetuprebase always
Contrariwise, if you're publishing your branch and intend to squash it
when you merge back onto master, it's ok to use git pull
and make merge commits. You'll squash them later -- just remember to do
it.
So... you've made all your changes, and they've passed all their unit tests (building from the command line, not Eclipse), and you've posted your code review, dealt with all the comments, and gotten a "ship it" or equivalent. Good. Now rebase from origin/master and run all the tests again. Fix anything that broke.
Now it's time to merge back to master.
There are two schools of thought on how this should be done. Most people want to see a nice, clean sequence of ordinary commits on master, each one corresponding to a code review. Otherwise known as "what you did on your branch, all squashed down into one commit".
That's easy. If you've been following instructions and using
--amend
, you only have one commit on your branch.
All you have to do is:
git checkout master git pull git merge --ff-only mybranch git push
Don't forget the pull! That's doing two things: making sure
that nobody snuck in with another commit on master while you weren't
looking, and making sure that master is in sync with origin/master in case
you've been using git pull --rebase
.
The --ff-only
makes sure that you really did rebase
your branch, and it's all up to date.
If your branch does have multiple commits on it, replace the merge with
git rebase -i mybranch
... and see below under squashing.
Some groups like to see a real merge commit on master. That's easy, too:
just replace "--ff-only
" with "--no-ff
"! That
tells git not to do a fast-forward even if it looks like one, and
make an explicit merge commit.
Groups with a really formal process will want you to push your branch and
have someone else merge it with a git pull
command. That's what the GitHub process expects -- it's the pure git
version of a "ship it" in Review Board.
In that case, you probably don't want your branch tracking origin/master, because you want to be able to push it to GitHub or wherever as a branch.
Mistakes happen. It's probably a good idea, especially at first, to run
"gitk --all
" before you push. You either get to admire
your nice clean commit, or see something that needs fixing.
In that case, no harm, no foul:
git reset --hard origin/master
and you're back to where you were before your merge.
This, by the way, is why you want to make sure everything is committed before you merge or pull -- it makes it a lot easier to recover if something goes horribly wrong.
Here, in no particular order, are some additional tips:
Reformatting and refactoring are similar in that they both make a lot of changes that, ideally, have no effect on whether or not your code works.
Reformatting makes dozens, perhaps hundreds, of changes in whitespace, indentation, and line breaks. Refactoring operations, for example, renaming a class, make lots of little changes to dozens of different files.
Both make it extremely hard to merge changes that were made before the operation was done. If there are enough changes to a file, there are almost certain to be conflicts. And in some cases, whole blocks of code changes can get rewritten or deleted in a merge with a reformatted file.
So here's how you do it:
Every so often you find yourself having to make more than one commit on the same branch. It sometimes happens to me if I'm working on two different machines, or if there are more than one person collaborating on the same branch. It's almost impossible to avoid if you're in the habit of committing files from your editor. Here's where squashing comes in handy.
The easy way, if you just want to get it over with, is
git merge --squash mybranch git commit
This makes all the same changes as merging your branch, but doesn't make a merge commit. When you actually do the commit, on the next line, you get to edit the automatically-generated commit message to something more sensible than "Squashed commit of the following:", followed by all the commit messages on your branch. Hopefully, you can keep the first one, and maybe turn a few of the following ones into bullet points.
You can have more control over which commits to keep and which to squash with the following, which squashes your branch on top of origin/master.
git checkout master git rebase --interactive --autosquash mybranch
It happens to everyone. You're hacking along contentedly when suddenly, "Oh fiddlesticks! I forgot to start a new branch!"
No problem. Just make your branch, and the changes come along for free. Even if you committed them -- you can simply make your branch, then go back to master (or wherever) and reset it. (See above under "Oops".)