I take it that you don't work on a large corporate code-base / don't have to cod...

coliveira · on March 5, 2022

This has nothing to do with formatting. When you create a change to a code base you should be submitting only the lines that are new/changed. If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.

derefr · on March 5, 2022

> If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.

If you add a line between two existing lines, and then insert after it a new blank line to serve as a sort of "paragraph marker", is that a "pure formatting" change?

If you add a constant in a group of constants, whose name is longer than the existing ones, do you pad the spacing of the values of those constants so they line up with one-another?

For that matter, if you fully-qualify a previously-unqualified and potentially-ambiguous identifier, is that a "pure formatting" change? Some auto-formatter tools do this, after all.

These are things that people may or may not do in code-bases, that "fly under the radar" of even the most stringent of human code-reviewers, because they're so irrelevant to understanding the code. They're "fluff." But because of this, how people introduce that fluff is essentially random, and so the cause of a lot of diff noise. These are the things that auto-formatters can "lock down" to only happen a certain way.

But I think you're missing the forest for the trees, as I mostly wasn't talking about pure formatting changes. What I'm talking about is more like:

You add a formal parameter to a function. Before, the function's clause head was less than 80 characters. Now it's more than 80 characters. Do you break the formal parameter list onto the next line? If so, how far do you indent it? Do you split the formal-parameter list up so that each parameter is now on its own line? Etc.

Done by humans with no strict standard, these sort of one-off judgements made arbitrarily will add up to "syntax rot" — not something you observe with your eyes, but a sort of "potential energy" of un-made formatting changes, that means that any given semantic change by a sufficiently-motivated human might become the impetus for a manual reformatting during that semantic change, such that that reformatting will happen at a random time, inflating a patch where affecting that additional code wasn't strictly necessary. (If you ask the programmer why they did it, they'll say they needed to "clean up the code they were working on" so that they could understand it well enough to apply the fix.) Which is horrible for both code review and merge predictability.

On the other hand, an auto-formatting tool will apply that transformation exactly when it becomes necessary; and will pick some way of formatting the additional lines and stick to it. There's no "potential energy" there. At all times, the codebase is "at rest", with no chance of anyone introducing "arbitrary" (but actually left-over) formatting changes.

Human formatting is like a sequence of DML statements in an RDBMS. Auto-formatting is like a sequence of operations against a CRDT. Given a bunch of changes run in a random order, the output of human formatters will be arbitrary, while the output of auto-formatting will be deterministic. Which is what you want, if you're doing complex things involving e.g. long-maintained stable branches for 1.x that cherry-pick changes from 2.x.