Let’s just state this from the beginning: Git is an amazing tool. And I’ve used it now for several years in many projects. But being a Git trainer myself, I’ve often seen people in the trainings fiddling around with the new concepts. Today, I want to give you a small introduction in one of the best features – in my opinion – of Git, known as rebase, and where the differences to the common known merge are.
If you’ve never heard about Git, this is the official description:
“Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.”.
Thus if you are reading this post and never tried out Git before, go ahead to the official homepage http://git-scm.com/ and try out their online learning course (takes around 15 minutes to complete).
The concept of merging (of a file) is quite old in computer science which is used to bring parallel independent changes back together. Without these algorithms, we wouldn’t be able to work in a team parallel on the same set of files but only through the lock-modify-unlock, or also known as pessimistic lock, mechanism (and I hope everyone agrees that this is hardly an optimal solution). Looking at the following picture which depicts a rather easy Git tree (starting from the left) having 5 commits in total.
Let’s say the red dots highlight the commits of the main development path and the green ones are some commits from a team member who is currently developing a feature or fixing a bug. To bring these two working branches together again, a so called “merge commit” is created in the process. The following command in Git is used for this (if the green branch is checked out):
The resulting outcome can be seen in the next picture. A new commit – the blue one – is created:
Thus everything is fine here, right? Many people will now say yes; as from a functionality point of view, the last commit contains all the work from the team members and the new functionality. But having many team members working on the same application and using this merge approach, the Git tree could become rather “messy” which is not very easy to read and understand anymore. Ultimately, it could look like the following (extract from a real-life example):
Not only around 40% of all of these commits are only merge commits, it is quite impossible in a short time to understand the chronological development of the source code and the dependencies between the commits. Furthermore, all of these “Merge branch …” commit messages obfuscate the “real” commit messages and make it much harder to read.
In the context of Git, there is another command, which is quite similar but produces a totally different output in the resulting graph. This command is called:
“Forward-port local commits to the updated upstream head”
Ok… what does this mean anyway? I suspect no one, who doesn’t have a deep knowledge about Git, would understand this right away. Thus let’s forget about this statement for now and go for a testing round. Instead of using the merge command as above, we execute the rebase command on the same example as before:
If we now look at the resulting tree, we’ll see the following:
Instead of creating a new commit which merely combines two branches into one again, we basically copy our green commits to the main line path by “changing” the parent commit reference. The old two green commits will not be visible to the user anymore (but are still there for recovery etc.). Merge conflicts, which could definitely arise during this process, are fixed directly in the respective commit and not in the additional merge commit. The huge benefit of this will become visible if many commits are rebased onto the main line of development: If in one of these commits a merge conflict is provoked, the change to fix this conflict will always be done in the corresponding commit and never in a totally separate commit at the end. The resulting Git tree of the main line by using this approach will be nothing more than a straight line of commits: one commit on top of another. The following picture is taken from our in-house project inspectIT:
Basically, the differences between rebase and merge can be broken down into two points:
The parent of the first commit of the branch
Creation of an additional commit at the end connecting both branches
For the first point, people often find it better to know where their work originated from. But the code of the application progressed in the meantime and so should your work. If many commits have been pushed to the central repository while you were working on something different, the initial code you created must be adapted to the new changes before releasing it to everyone else. By rebasing and changing the reference to the parent commit before releasing; you make sure that every single commit you created contains code which is based completely on the latest version of the application. You won’t lose any meta-information in this process – like the original author or the creation date. The work to combine two branches will be the same – no matter if one uses rebase or merge – and so the question should be rather how complex the Git tree should become over time.
Does this mean merge is deprecated?
From my point of view, there is no advantage in using merge over rebase in a single team developing any kind of application with fast feature branches. The only reason to use merge in Git is to use the special fast-forward merge. This merge does not create a new merge commit as described above but just forwards the current branch to the target branch. This works because the target branch is already completely based on the current branch and so a merge commit will never be needed and merge conflicts will never arise.
But still, a merge could be useful in the situation if the application development is divided into totally separate teams with some specific integration days in which the code is brought together again.
So what is your experience with rebase and/or merge? When do you use one over the other? Is it needed for you to rely totally on the plain merge approach? I would really like to hear from you how you handle this in your project!