Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I take it that you don't work on a large corporate code-base / don't have to code-review other people's code?

Auto-formatting (esp. when used as a pre-commit hook) means that changes people make to the style are ignored/reverted (and/or, that places where people introduce a different style in new code, are auto-formatted back into the existing style immediately, rather than that needing to be an additional commit later on.) Thus, no spurious diff lines from formatting. Thus, not having to wade through a bunch of "noise" diff-lines, to get to the "signal" of semantic changes at code-review time.

Also, having auto-formatting on both your main branch + development branches, makes merge/rebase conflicts less likely to happen. (Which basically boils down to "fewer noise diff-lines" again.)

In other words: auto-formatting makes code more machine-legible to syntax-blind parsers; which in turn allows tooling like diff(1) to be more helpful.

(Yes, we could just have language-syntax-aware semantic-level diff/merge/etc. tools. Not sure why nobody ever made these. I bet this is one of those things where Lisp users have had it for ages but using their own parallel world of abstractions that doesn't exist in C/POSIX.)



This has nothing to do with formatting. When you create a change to a code base you should be submitting only the lines that are new/changed. If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.


> If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.

If you add a line between two existing lines, and then insert after it a new blank line to serve as a sort of "paragraph marker", is that a "pure formatting" change?

If you add a constant in a group of constants, whose name is longer than the existing ones, do you pad the spacing of the values of those constants so they line up with one-another?

For that matter, if you fully-qualify a previously-unqualified and potentially-ambiguous identifier, is that a "pure formatting" change? Some auto-formatter tools do this, after all.

These are things that people may or may not do in code-bases, that "fly under the radar" of even the most stringent of human code-reviewers, because they're so irrelevant to understanding the code. They're "fluff." But because of this, how people introduce that fluff is essentially random, and so the cause of a lot of diff noise. These are the things that auto-formatters can "lock down" to only happen a certain way.

But I think you're missing the forest for the trees, as I mostly wasn't talking about pure formatting changes. What I'm talking about is more like:

You add a formal parameter to a function. Before, the function's clause head was less than 80 characters. Now it's more than 80 characters. Do you break the formal parameter list onto the next line? If so, how far do you indent it? Do you split the formal-parameter list up so that each parameter is now on its own line? Etc.

Done by humans with no strict standard, these sort of one-off judgements made arbitrarily will add up to "syntax rot" — not something you observe with your eyes, but a sort of "potential energy" of un-made formatting changes, that means that any given semantic change by a sufficiently-motivated human might become the impetus for a manual reformatting during that semantic change, such that that reformatting will happen at a random time, inflating a patch where affecting that additional code wasn't strictly necessary. (If you ask the programmer why they did it, they'll say they needed to "clean up the code they were working on" so that they could understand it well enough to apply the fix.) Which is horrible for both code review and merge predictability.

On the other hand, an auto-formatting tool will apply that transformation exactly when it becomes necessary; and will pick some way of formatting the additional lines and stick to it. There's no "potential energy" there. At all times, the codebase is "at rest", with no chance of anyone introducing "arbitrary" (but actually left-over) formatting changes.

Human formatting is like a sequence of DML statements in an RDBMS. Auto-formatting is like a sequence of operations against a CRDT. Given a bunch of changes run in a random order, the output of human formatters will be arbitrary, while the output of auto-formatting will be deterministic. Which is what you want, if you're doing complex things involving e.g. long-maintained stable branches for 1.x that cherry-pick changes from 2.x.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: