Love your version history, and it’ll love you back
I’m involved in various ways with a lot of different projects. My roles in them vary; in some cases I’m crafting code, while in others my role is much more to do code review and serve as a mentor or architect. Something that also varies is how the various teams I encounter view their version control system.
I’m not talking about whether they praise it or despair over it, though of course I see both happen. The thing I want to talk about in this post transcends the version control system in use. Granted, some make it easier than others to achieve the ideals I’m going to advocate, but at its heart, this isn’t a discussion about what version control system to use; it’s about how to use it.
What is a commit anyway?
Probably every version control system has some notion of “making a commit”.
In this process, we take a set of changes we have done and incorporate them into the versioned history of our software. In modern version control systems, that group of changes – whether they are in just a single file or span many files – are grouped together and given some kind of identifier. Terminology varies; some systems call this a changeset, some call it a commit, and there’s no doubt other terms out there.
Thus, a commit is a bunch of changes that are grouped together. More than that, they are grouped together in a way that is under our control; it is up to us to decide what goes into a commit. We also have the possibility to write a commit message, summarizing the contents of the commit or explaining the motivation for it. This raises an interesting question: how do we do that grouping, and how do we describe it?
A commit a day keeps the data loss away
Here’s one approach I’ve heard: “I commit around 5pm, before I go home for the day, so the day’s work isn’t lost if there’s a disk failure.” Well, that’s a fairly straightforward grouping. A commit is a day’s work. I guess a commit before lunch could be good too, just to be on the safe side. This gives version control the role of a glorified backup system, which occasionally moans at you because you changed something that another team member has also changed. Then you have to do one of those darned merge things.
Suppose we work this way, and we go back and look over our version history.
What do we have? A set of groups of changes…telling us what happened in the period of time since the last commit. If we’re lucky, maybe the developer will have commented on the things they did in the commit message. However, it’s probably going to be fairly general; it likely spans a range of fixes and features, or maybe has part way progress on one. And even if it does list the things that happened, all of them are mixed together.
In short, this kind of approach leads to a version history that is somewhere between unhelpful and utterly useless if we want to actually use it for something.
If you’re reading this and thinking, “well, what might we want to use it for”, then you are missing out on a bunch of opportunities. Version control systems are not just glorified backup systems. They’re not even just ways to manage concurrent versions of software.
They’re also your project’s history book, and the project developers are its authors.
“Added more stuff”
This is a real commit message, from a real project. In fact, it wasn’t just used on a single commit; it was used on around twenty different ones. Imagine a book whose table of contents had the same heading again and again, because every section of the book was called the same thing, regardless of what it contained!
When I commit to a project’s version control, I see myself as a contributing author to the project’s history book. Trying to make this history meaningful, well structured and useful matters to me. It’s not just that I try to write good commit messages. It’s more than that. It’s that I try to make atomic commits that capture a single meaningful transition of the system from one state to another.
This isn’t about atomicity at the version control system implementation level. This is about atomicity at a higher level. Let’s take an example. Suppose that I am working on implementing a new feature. As I look into doing so, I realize that I should probably make a small refactor first. I do so. Along the way, I discover some code that is using tabs rather than spaces for layout, against the project’s coding standards, and decide to fix this also. How many commits would I do during this work?
First, I’d make one for the whitespace fix. The commit message would clearly indicate that the commit is a coding standards fix with no interesting changes. This specifies the motive, but also hints to anybody reading the history that the commit is not at all important to the overall story.
Next, I’d make one for the refactor. The commit message would indicate that it was a refactor – something that did move code around or change it, but was not intended to make a functional difference to the software. If somebody discovers it does, then we know I screwed up.
Finally, I’d make a third commit for adding the new feature, indicating in the commit message that the commit intends to add new functionality, or perhaps modify existing functionality in some way.
What have we gained?
Our version history now is broken down into more manageable changes, each of them with a clear role. When I’m in code review mode, this is extremely pleasing. The whitespace changes can be skipped over; had they made it into a commit I felt I should review, they would have obscured the important changes. When I review the refactor, I know that I’m looking for any accidental functional changes that may have slipped in. When I review the commit that adds the new feature, I can reasonably expect that all of the additions I see directly relate to its implementation in some way. A good version history helps me to deliver better value when I do code review. In an open source setting, easy to scan through commits have often resulted in the “many eyes” dream coming true; even today I got an email from somebody who happened to read something I committed, pointing out a small mistake that would otherwise have slipped through the cracks.
Of course, bugs do slip in to systems. Even if you have tens of thousands of tests, it’s still possible to have gaps in the coverage. If you have a known good version of the system and you know that it doesn’t work in some way now, then it’s often possible to write a test for the issue, then do a binary search over the commits to close in on the one that caused the regression. This isn’t science fiction technology; we really do this in one project I’m involved with (it turns out that git even has a bisect command that helps with this). Having commits that are very focused means the result of the bisect can be very precise about where things went wrong.
But won’t it slow me down?
Only in good ways. Yes, there are good ways to be slowed down as a developer. Having to take a moment to write a commit message that justifies the step you just took is an example of a good way. There have been multiple occasions when I thought, “screw it, I’ll just throw in a hack” – then tried to write a commit message to justify why it’s tolerable, and realized it’s not, and I should do something better. I’m human, just like every other developer out there.
I need safety nets if I’m to develop with confidence. My tests are one such safety net; the practice of regularly stopping, checking what I just did and ensuring I can explain it is another. After all, if I can’t explain a change, I probably don’t understand it, and that’s a problem.
That aside, though, I generally feel more efficient as a developer when I make atomic commits.
I suspect part of it is psychological: a commit is a step of progress, which feels rewarding.
More objectively, having a small current “working set” of changes makes it much faster to check over my work at each step along the way. And even when I’m focused on developing rather than code reviewing, having a well organized history of my own recent changes has paid off time and time again.
Give it a go!
Give atomic commits a go. See how they change the way you approach your work. Don’t worry about perfection at first; even as somebody who has practiced atomic commits for quite a while, I still find some situations that are borderline over whether to do two commits or one. The goal is a version history that serves you and the rest of your team, aiding you in building good quality software in an efficient manner.
// Jonathan Worthington, author of this article.
From business applications to compiler writing, and from .Net to Perl, Jonathan has a wide range of software development experience. He deeply believes that good development has to be a strongly holistic activity, drawing on mathematics, engineering, linguistics, economics, psychology and more. By looking at insights from many fields, he works hard to deliver solid and maintainable software solutions. Originally from the UK, and having spent time in Spain and Slovakia, Jonathan is currently based in Sweden and working for Edument AB.
Entry filed under: Uncategorized. Tags: .