I'm a big fan of the rebase workflow, but not of squashing. I wrote it as several separate commits for a reason: documenting each step, making each step revertible, separating refactors from from semantic changes, etc. Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)
Most developers I've worked with think of Git as a CLI front-end to GitHub. They aren't writing commits with the intention of showing their work, they are using commits to save their work before signing off, or to trigger CI. They aren't proficient enough with Git to go back over these and clean them up, as would be expected in a project like the Linux kernel.
If that's the state of developer proficiency with version control then, ideal workflows aside, we need a way to prevent those noise commits from ending up in master, and the easiest way is to squash with every pull request, as part of the "merge" automation.
Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise? That's what I do for my projects. Most contributions coming from others tend to get squashed & merged, where as I'll rebase & merge my commits. But I'll sometimes rebase & merge PRs from other contributors who have broken their change down into thoughtful commits.
The effective management approach (in opposition to the limp-souled "servant leader" anemia) is to make decisions for your team to eliminate stupid work.
What, you're going to let anyone waste time on tabs versus spaces?
You've presupposed that this is all about stupid stuff in the first place. Moreover, mixing tabs and spaces for indentation leads to sadness. But squash and rebase are perfectly compatible with over another.
Programmers sometimes have to decide stuff. That's okay.
> Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise?
"When appropriate" is the key here. That's asking people to make a decision, take on additional cognitive load, which does not have a clear deliverable or success criteria. In practice the default is what happens.
I'm a fan of the workflow where the PR gets squashed in the upstream git repo, but the individual commits are preserved in the PR in the code review tool. I feel that Phabricator handles this well.
But does that still lose the source commit long term? What I'd love to have is a mechanism that keeps references to the pre-squash commits at blame granularity, allowing one to dig deeper into the commit messages associated with a given line. Kind of like a sourcemap, but for squash instead of transpile.
You don't lose it long term if you're using GitHub PRs — GitHub keeps the "reflog" (quoted because I imagine their implementation may not actually use reflog) of the branch indefinitely, even after force pushes. Graphite (built to replicate the Phabricator workflow) enables viewing and diffing these versions. (disclaimer, I helped build this)
When you squash merge on github the new commit references the old PR. If you don't delete branches on merge you would keep the commit history on that branch, but then you have to battle with branchs persisting forever.
You’re translating the problem from : searching through branches that are named according to their ticket and what they are meant to accomplish to: complex and not-context-free git bisect.
GitHub and Azure DevOps also do that, you just need to know where to look.
I don’t mind squashing either, unless I’m being really intentional or rewriting my history my intermediate commits couldn’t be reverted without leaving stuff broken (totally a me problem of course).
Intermediate commits being snapshots of “broken” state isn’t a problem at all. When I quit for the day, I commit, broken or not, and pick it up in the morning. I want to be able to drop my laptop in a puddle and still pick up where I left off when I get a new one.
I think the problem is putting those broken commits into the trunk. Ideally you want to clean up your commits so if you need to revert you don’t accidentally break your build and so reading thru history isn’t awful.
Squashing is nice IMHO, and even a must after a while. For one recent very small project a squash of the commit history reduced the storage from tens of kilobytes to a few hundred bytes total. Orders of magnitude. That was a very small project, so imagine the storage space savings for larger projects.
I find that the commit history tends to grow viciously for anything I've been involved with. And I fail to see the benefit of amassing that amount of detail once you are past the stages where each individual commit is reversible (or even interesting)
So, for a project that runs for, say, three months, the commits of the first few weeks aren't really very interesting or valuable at all at the end of the period. Just hard drive space being eaten up. YMMV.
Real example: I had to mitigate an outage in the middle of the night, and I found the root cause in ten lines of code. I needed very badly to “git blame” that code and find a specific commit message from three years ago and its author (a former colleague), to figure out what he had been trying to do.
Right now I have a full clone of a pretty large monorepo dating back almost nine years, and the .git dir is less than half of the total space. Sparse checkouts and shallow clones can make clown car hardware sort of work, but I do not want to go back to the pre-git days and try to work without full history to conserve 0.008 TB of SSD. We spend more than that on coffee.
Ok, so it appears we just approach some problems differently. If I had to work in the middle of the night and found a problem spanning ten LOC my procedure would be "fix, commit, back to sleep". I would never feel any need (nor have the time) to investigate the origin of the problem, much less feel a need to "git blame" anyone. Likely I would not even look at the commit history as (to me) historical code is not really relevant from a current-problem-solving standpoint. But that's just me I guess.
> make clown car hardware
I'm not a native English speaker so this appears to me as a bit confrontational and/or an attempt to ridicule me or the points that I am making? I do not work with clowns, cars, or hardware.
> full history
I can see one benefit, and that is the case of sensitive software for special use (eg govt. mil, etc). Here, I agree that in the case of a vuln being discovered it could be a good thing to go back and trace the origin. So, I'm not opposed to keeping history, it's just not relevant to any particular extent for the types of tasks I do in my current $job.
My stance is more like, if I'll never use any of this stuff, why keep it at all? It's not about costs it's just keeping things simple
A clown car is comically small for its use (the joke is that it arrives and clowns keep getting out, seemingly more than could have been inside). I’m sorry, I didn’t mean you as a poster, but someone else who is hiring professionals while under-specifying hardware they provide you. Storing source code is not a problem you should have reason to worry about.
The problem with “fix it now” is that I didn’t know for sure that our behavior was wrong, I just knew that a consumer of our microservice had begun alerting on errors. I had to find out whether this was a mistake (which maybe I could safely fix) or important and intended (and any change must be negotiated with other consumers). It comes with having an old, complex system with a lot of dependencies and without exhaustive documentation.
I can't count how many times this has happened to me. Trace an issue down to some piece of code that makes it look very intentional, then have to go spelunking through whatever history I can find to figure out what the actual correct specified behavior is. Bonus fun when you have to start reaching out to clients to find out if anyone is relying on it acting that way.
There are many reasons for having several commits in the same PR.
PRs often have a lot of overhead. They need a separate branch, CI jobs need to run, there are more notifications for everyone, separate approvals, etc.
Sometimes there's a need for keeping separate commits if they're all related to a single change. Proposing them all as part of the same PR helps maintaining that context while reviewing as well. Reviewers can always choose to focus on individual commits, and see the progression easily.
Sometimes it does makes sense to squash a PR if the work is part of the same change, but the golden rule of atomic commits always applies. Never just blindly squash PRs.
In fact, if the PR is messy and contains fixes and changes from the review, and the PR should have more than one commit, I go back and clean up the history by squashing each change to its appropriate commit. `git commit --fixup` helps with this, so I also prefer addressing comments locally rather than via GitHub's UI. Then it's a simple matter of running `git rebase --autosquash`.
Interesting. I'll give it a try, thanks. At first glance it reads like too much magic that I'd have to undo manually, but if it works reliably, it would save some time. Though usually it's not much work to figure out which commit to reference in `--fixup`. In short histories, I usually just use `HEAD~<n>`, instead of hashes.
BTW, thanks for all your open source work. <3 You're an inspiration!
It is a very nice bit of magic. I've been using it for years, and I don't think it has ever given me a false positive. Plenty of false negatives though, but it tells you. And then you just fall back to what you've been doing anyway. It is literally a tool that has removed a fair amount of tedium from my workflow.
A step is only revertible if the previous state has passed CI, and as you noted that only happens if it is a separate PR.
The only reason to split a PR into separate commits it to make it easier to review and understand. But if it's so big you need to do that, it should be separate PRs anyway really.
IMO the only time you should ever preserve branches when merging them is if they're long-lived ones that multiple people have worked on and the commits in them have passed CI.
> serializing the PRs, i.e. only have one PR outstanding at a time, which increases your development latency
I don't see a significant downside to this. It doesn't affect my development latency - if the commits depend on each serially other then you have to wait for them to be reviewed in order whether or not they are separate PRs.
I do agree that GitHub/Gitlab don't support dependent PRs very well. You pretty much have to wait for one to be merged before submitting the next. Not a big problem in practice but it could definitely be improved.
Once you start doing serialized PRs then doing squash merges increases your chance of a nightmare.
For example you have PR 1 and PR 2 (based on 1). Squash merge PR 1 into main. But while 2 is under review someone merges in 3 into main and you need to bring it into 2 to resolve some conflicts. You’re now hosed because the changes done in 1 are now present TWICE. First in 1’s commit, and also in the squash commit. Now you’re resolving all of 1’s changes as merge conflicts.
This may sound like a contrived examples but it has happened to me EVERY time I’ve worked somewhere that demands squash commits. And this is one of the reasons I hate squash commits.
“Makes the history prettier” vs “Rewriting the actual commit history”. Telling the truth > pretty.
Ah I think where you're going wrong is trying to merge master into PR 2. Instead you should just rebase PR 2 on top of master. Git will automatically figure out that PR 1 has been merged and drop it. If it doesn't for some reason you can just do an interactive rebase.
Git history isn't "the truth" until it's shared. There's absolutely no issue rewriting history history for your own edits that nobody else is using yet. You do that every time you press undo!
> Git will automatically figure out that PR 1 has been merged and drop it.
Unfortunately, that part doesn't work if you let GitHub do a squash, because then the commit on main has no corresponding commit on (the branch of) PR2.
When cherrypicking the commits corresponding to PR1 during the rebase, the merge algorithm will notice that the change is already there and notify you of the empty commit if the change is completely identical. But if it isn't, that falls down.
It's still not too difficult to recover, but it's annoying.
If you're doing proper atomic commits, as most people critiquing squashing probably are, the overhead of making separate PRs for each would be ludicrous. It depends heavily on the situation, but in favorable conditions (greenfield development of some simple CRUD app, for instance) you can easily produce dozens of clean, atomic commits a day. As part of the same PR, those take almost no time at all to review. Put into separate PRs, you'd be wasting a lot of time and effort both on the reviewer's and reviewee's side.
You can "just" do anything. The problem is that things don't "just" work that smoothly. What happens when the feature or bug fix I'm working on demands a refactor or a name change or something of that sort. I could put the refactor in a separate PR, but what if I'm not sure of the changes until I get far enough along with implementing the feature or bug fix? I might want to go back and tweak the refactor I did. So if I put up a PR as soon as the refactor was done, I'll then need to put up another PR with tweaks to it and then a third PR with the actual feature or bug fix. Or I could just put them all up in one PR together broken down by commit. Reviewers can review commit-by-commit. Or I could wait until I've finished everything, and then I'm left with submitting a single PR or splitting them into multiple PRs are submitting them simultaneously. (And dealing with stacking them appropriately.)
This is of course a balancing act. Which is my point. Sometimes it makes sense to split things up into multiple PRs. But sometimes it makes sense to fatten a PR a little bit with multiple commits. You can't just say "small PRs are better." Size is but one dimension of what makes a good PR.
This is why I personally use both "squash & merge" and "rebase & merge." If a PR has a bunch of commits but is really just one logical change, then I'll squash it. But if a PR has thoughtful commits, then I'll rebase it and do a fast-forward merge.
My bottom line is that I try to treat the source history as well as I treat the source. The source history is a tool for communicating changes to other humans, both for review and for looking back on. Squash & merge has a place in that worldview, but so does rebase & merge.
The problem there is that toy systems like Github's review tool don't have good flows for stacking dependent changes, and then people end up not bothering.
For this reason I don't bother with Github and the like and just use Gerrit ;)
If you actually have commits which stand alone - i.e. the build succeeds, the tests pass etc - I see no reason not to land them altogether with a rebase. What I object to is commits which do _not_ meet those criteria, and make life much harder for the next person who has to spelunk through the history trying to work out what has happened and why.
We're in full agreement there. If you run an "every commit should compile" project, rebase. If you have "fix" and "another fix" commits stacked atop the original in a PR, by all means squash them all.
Yeah it can be a bit of a pain. If you have long running branches or do keep hitting having to re-resolve, getting good at git rerere is recommended; "reuse recorded resolution"!
Rebasing generally doesn't require repeated merge conflicts though? Since the formerly conflicting commit disappears and the non-conflicting one is now baked into a linear history.
Regardless, `git rerere` is supposed to solve that problem, but I don't do enough conflicting merges to be intimately familiar with it in practice.
It sounds like you're more describing a stacked PR workflow. I achieve the same thing using stacked PRs and can still have a bunch of the "fix stuff" intermediate commits that get squashed away, because who can really predict whether everything will pass CI in the first attempt. :)
I don't want to have a stack of PRs where each one depends on the previous, where each PR needs to justify itself separately while at the same time being interdependent and ordered. That adds cognitive overhead and makes it take longer to get merged, if it gets merged at all.
It's possible the tooling could handle that case much better, but until it's sufficiently better that it's as simple as `gh pr create` by the author and one click of a merge button (or equivalent "@somebot merge") by the reviewer, that's still too much.
If you're using GitHub or gitlab and merging through pull requests, I've found that these commits become duplicative and, given GitHub's rich comments-based collaboration UX, somewhat lossy. It is much easier/more valuable for me to view a commit off main that points to the PR that brought it in and the discussion that took place (along with the original commit in that branch) than to see the individual commit rebased into main without context.
Also, a lot of people (myself included) write really crappy commit messages that don't tell the whole story behind a change. This is another reason why falling back on the PR has been valuable for me.
Historically any time I try to rebase a branch with more than a few commits in it, it effectively fails unless I squash it because resolving all the conflicts for every commit would take me over an hour. Maybe this is just a 'many contributors' problem though, since our repo lands many large PRs each day.
(Merging has the same problem, so I squash frequently and then rebase.)
Me too. I think git blame is the ultimate documentation tool - you can have a description for each block of code, tied to who and when it happened. If you squash, all the sudden you have a simple explanation for hundreds or thousands of lines.
> Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
Can you give an example? I don't even think I understand what you're saying. You control your overall summary, so it's up to you to make it useful.
I would cite clarity as my reason for squashing! I think most people are just bad at organizing (& naming) their commits, so they make A LOT of vacuous ones, crowding out the forest for the trees. It's never helpful to see the git blame saying things like "Addressed PR comments" or "Resolved merge conflict", etc.
I do prefer a merge commit when the author has done a good job with their own commits, but that's rare. In all other cases, squashing is greatly preferable to me.
> > Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.
> Can you give an example?
Sure. Here's a common pattern I've used and seen others use:
Commit 1: "Introduce a helper for XYZ", +35,-8, changes 1 file. Introduce a helper for a common operation that appears in various places in the codebase.
Commit 2: "Use the XYZ helper in most places", +60,-260, changes 17 files. Use the helper in all the mechanically easy places.
Commit 3: "Rework ABC to fit with the XYZ helper", +15,-20, changes 1 file. Use the helper in a complicated case that could use some extra scrutiny.
I don't want those squashed together; I want to make it easy, both in the PR and later on, to be able to see each step. Otherwise, the mechanical changes in commit 2 will bury the actual implementation from commit 1 in the middle (wherever the file sorts), and will mask the added complexity in the commit 3 case.
(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)