Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The hardest program I've ever written (2015) (stuffwithstuff.com)
261 points by graderjs on March 5, 2022 | hide | past | favorite | 133 comments


The Dart formatter sounds really advanced, and reflects potential complexity of the language.

I think I'd still prefer to see a formatter attempt to preserve any formatting which is already 'good enough' to pass as an output threshold. Code isn't just a recipe for a computer to do something, it's a language for explaining to other programmers what that thing is and what's important to the structure of accomplishing it. The choice of where to place a break can matter for cognition and can be almost as important as the printed characters for organizing thoughts.


It might be true but as OP says in the article, the moment you have potential multiple ways to show the same piece of code, you’re going to surface engineers with each opinion on the code review. Getting an opinionated formatter is the best way to bring engineers back to do what they really need to be doing in code review, which is review the code not its formatting. I’ll never go back to python without black and Isort!


I feel like enforcing a specific text style is the wrong approach.

In my opinion, different people have different formatting preferences and forcing someone to use the "wrong" one will lead to slower reading speed and the risk of overlooking errors.

That's why I believe we should treat it like font or color choices. The IDE should display source code in the viewer's preferred style, so that each person sees what they expect. And then the actual source code formatting becomes irrelevant.

Go already goes a good step into this direction by making a language AST tree part of their core libraries. And opening Go source code in JetBrain's GoLand will show you additional annotations, spacing, etc. based on the parsed source code tree (and not based on the source code's text).


Having a strictly (formatter) enforced style is actually how you allow people the freedom to use their preferred style.

Everyone can just set up a pre-commit hook to automatically format the code back into the official style, then everyone is free to put the codebase back into their preferred style.

Otherwise, if developers reformatted code to their preferred style regularly, it would create massive diffs.


I feel like I might have explained that badly. In my suggestion, the source code would be diffed as a machine-readable AST tree. That means all source code reformatting actions which do not change the meaning of the source code also do not appear in the diff.

In this explanation about Go:

https://golangdocs.com/golang-ast-package

the text following "and we get a nice structure" is what the source code management tools would be working on. It's an abstract representation of the source code's meaning, but not tied to how things are formatted or indented.


I agree it would (maybe) be better if things had been designed this way from the beginning, but as it is it would be totally incompatible with all existing source code management tools, which is a nonstarter. Even if you can switch to a new editor or IDE, you still have to worry about the GitHub web UI and your error reporting tooling and who even knows what else.


> That means all source code reformatting actions which do not change the meaning of the source code also do not appear in the diff.

The method I described does exactly that. Most modern formatting tools parse code into an AST then rewrite it based formatting rules. So as long as you format both files to the same rules before diffing, only AST changes will show up.


On paper I like this idea. But I can't shake the feeling that every non-text based format I have ever encountered sucks.


And this is how git works with the different newline styles. It can convert them to a single enforced style on commit and converted back when you pull changes.


I've never had that work correctly. Mostly it just gets in the way when the line endings actually matter (because my javascript has to load in IE7).


From experience I can tell that one can get used to other people's styles and just work with that oneself. Unifying among an enforced style and sticking to it really reduces the amount of unnecessary discussions.

I genuinely feel that focusing a lot on the detriment of someone else's or an established code style is a sign of a lack of team work.


The AST usually isn't enough, you want a CST (or what a few sources call "full" syntax tree, preserving white space, comments, etc). I think .NET has the best implementation of this out there, unsurprisingly they have incredible tooling support.


For anyone interested in CST tooling outside of the .NET ecosystem, tree-sitter[1] is general purpose, quite fast, supports a wide range of grammars, and has bindings in quite a lot of environments (including WASM, so likely can be used anywhere with some effort).

1: https://tree-sitter.github.io/tree-sitter/


Tree sitter is absurdly complicated to use in a real project, a hand written parser might be slower but it's way easier to implement and build.


I haven’t used it in a real project yet, but I’ve tried it out for a few project ideas, and haven’t found it complicated to use. I’m not necessarily saying it isn’t ever complicated, but I was able to knock out a couple proofs of concept in a few hours. My only complaint so far is that the sexpr representation lacks some details, but I have no problem with working around that.


This seems like the arguments devs stuck on vim and eMacs keep making. “Ohhh I’m tired of moving my fingers from one key to another!!” [1] It’s just code, if you’re in a good team each PR is just a small diff, it’s hard to believe that’s somehow too complicated for an ostensibly good engineer.

[1] Incidentally the only people I know with carpel tunnel are folks who are entrenched in command line text editors. Maybe there’s some benefit to moving your arms around? Similarly maybe there’s benefit in reading code in a different format, you might actually read it slower and hence comprehend it better.


I am one of those folks entrenched in command line editors, and I avoid carpel tunnel by using a real keyboard (in my case, a real IBM model M) and not the absolute crap that passes for keyboards on laptops (or those keybords in the $1 discount bin).


Good for you but I stick to a standard apple chicklet keyboard and thin mouse and feel the most productive. The mix of moving your arm but always having it rested flat helps in my experience.


>In my opinion, different people have different formatting preferences and forcing someone to use the "wrong" one will lead to slower reading speed and the risk of overlooking errors.

Not if everybody using the language is strongly forced (as with Go), since then those with "different formatting preferences" will eventually (and soon) just get used to the enforced style.


Agreed! I use Prettier (driven by ESLint) in all my web projects (including enterprise clients', where I've helped introduce / improve / standardize tooling). IME it's a mistake not to use it.


> The Dart formatter sounds really advanced, and reflects potential complexity of the language.

I would not be surprised if a lisp formatter following the same philosophy as the dart formatter would be similarly complex, since the complexity comes from optimizing line length, not parsing.


Incidentally I spent yesterday implementing a Lisp formatter with the algorithm described by Jean-Philippe Bernardy in the simple and elegant paper "A Pretty But Not Greedy Printer."

https://jyp.github.io/pdf/Prettiest.pdf

It's based on introducing choice between vertical and horizontal stacking. It avoids combinatorial explosion by pruning away strictly suboptimal choices. With just a few extra rules for Lisp forms, the results are quite good.

I don't handle comments though, since I only need it for pretty printing values so far.


I also wrote a version of Bernardy's pretty printer, but I never did come up with a satisfactory way of formatting text together with code. The best looking solution of splitting on whitespace and making each word a group, either tacking it onto the current line or splitting onto a new line ends up being exponential even with pretty aggressive optimization. Everything else I tried looked bad.


> The Dart formatter sounds really advanced, and reflects potential complexity of the language.

You got that right -- check out this epic GitHub thread on optional semi-colons [1], and the author's own comment on the subject in an HN thread [2]

[1] https://github.com/dart-lang/language/issues/69

[2] https://news.ycombinator.com/item?id=22706645


I thought so as well. Until one day I just switched on black for python to see what it is like.

I realized there was a tradeoff I never considered — the amount of time it would cost me to manually format only 10% better than the black formatter is not something I am willing to invest.

And this is me talking about ideal situations where I calmy try to write beautiful code, when you consider that often you are not in the mood to spend a ton of time on formatting, because it distracts you from focusing onto the actual data flow, logic and so on a formatter as good as this is just beautiful.

This is not something I would've said before I tried the thing btw.


> I think I'd still prefer to see a formatter attempt to preserve any formatting which is already 'good enough' to pass as an output threshold.

That feels like Prettier sometimes. I think it leaves some objects on multiple lines or one line depending on how you leave them. But I’m not too sure.


I'd really hate a formatter to attempt to preserve formatting if it's "close enough"(tm). Because what is "close enough"(tm) is hugely subjective, like formatting style itself. Formatter should just format to a common style. No funny business.


There is tension between having a single source of truth for source code, including an opinionated formatter; and allowing individual programmers expressive facility to format and structure code in a way that makes most sense to them.

I've been thinking lately of a formatter that would resolve this tension. It would be a local, individual formatter, complementary to the global, opinionated, "Prettier" formatter.

When the source code was committed to remote for review, the opinionated formatter would do its thing, making sure the code was formatted properly according to whatever the team agreed. But locally, the code would remain however the developer liked it.


a recipe for merge conflicts.

i think maybe your only hope is to somehow “version the AST”, and let formatting be a style sheet or something. But i’m 1. talking out my butt and 2. sure i’m missing something.


It's a decent idea!

One problem though is all of the non-code elements, aka comments and how they intentionally align up with the code (mixed with spaces and tabs), so the ideal format is the one the original programmer wrote it in, and the second programmer using a different stylesheet can't edit comments and have them end up pretty for the first programmer. (The solution, obviously, is to not comment any code >_< )


If the display of the code is removed from its representation, I think the same could be done for comments. Comments could be kept as part of the AST and rendered how you like.

E.g.

    (+ 1 2)  ; Add two numbers.
Would become

    (comment (+ 1 2) "Add two numbers")
With semantics like const.

Another would be

    (+  ; Adding
     1  ; one
     2  ; and two.
     )
To

    ((comment + "add")
     (comment 1 "one")
     (comment 2 "two"))
You could display comments as popups, marginalia, or even in a traditional fashion (since some intent is captured by the comment scoping. You could also have different types of comments like annotation to have different kinds of display types.


I agree with the decency of the ide

But isn't this the nearest argument why we should start publishing negative results papers?

The most important thing that I have ever learned was a announcement made by Google about how their industry consortium seeking to trade print magazine advertising inventory, failed with reference to the nature of this failure. No matter how lacking in detail this notice was, no matter that I was sent down a seven years solitary and very lonely path of mercifully ultimate discovery, on the heels of my startups exit collapse due to my cofounder tragically dying but we'd have been sunk by the problems indirectly revealed in that Advertising Age news item. However painful, and I'm talking about therapy for years after I emerged from reclusion, and real health issues giving up running and my diet turning to junk energy hits...I have learned more from being told "no can do Z because of y" and I seriously think that we /seriously have to start already getting over the reasons why we don't talk about our failures/. I mean holy smoke wouldn't we ever run out of good conversation ever again if we could have a good old banter and brew over our cockups? I'm thinking that this is how women are so much more successful in reproduction if they actually want to. What could we do if we tried?


Was this GPT-2 trying to write a HN comment?


I'm not sure. Their posting history has multiple posts of this type with disconnected sentences. With GPT 2 at least there is usually some continuity and a semblance of a shared context between the sentences.


After reading several of the comments, I think OP is not a native English speaker, and that makes for some awkward grammar/sentence structure.


I am reminded of Terry Davis.


> let formatting be a style sheet

That's a good expression of the idea. If merge conflicts happen, we're doing it all wrong. Version control shouldn't even really be aware of this.


tree-sitter and git: now kith!


the most extreme validation of typographical precision is the banknote. Discuss?...


This is exactly what Unison is trying to do


> reflects potential complexity of the language.

The language is fairly complex syntactically and that definitely adds some cost to formatting.

But I think much of the complexity comes from two things:

1. A lot of idiomatic Dart uses function literals in a block-like way, as in:

    test(() {
      expect(1 + 2, 3);
    });
2. But Dart doesn't actually having trailing block argument syntax like Smalltalk, Ruby, and Kotlin. So the formatter has to look at the closures passed to an argument list and decide which ones look better using block formatting, versus regular argument list formatting like:

    someFunction(
        () {
          expect(1 + 2, 3);
        });
Also, at the time I first wrote dartfmt, there was a lot of very nicely hand-formatted code in the wild that used different subtle layout choices to make different argument lists look nice. In order to persuade people to adopt the formatter at all, it had to be sophisticated enough to figure out many of those patterns and apply them automatically.

It's not as good as a human (mainly because it doesn't have semantic context) but it had to be pretty close or people wouldn't have tried it.

Now that it's well established, I think it would probably be possible to simplify how it formats while still making users happy. Possibly happier because the results would be a little easier to predict.

> Code isn't just a recipe for a computer to do something, it's a language for explaining to other programmers what that thing is and what's important to the structure of accomplishing it.

An automated formatter will never be as good as carefully crafted artisanal formatting. In particular, automated formatters don't know what stuff means. A good human might choose to line break a function call like so:

    setColor(red: 123, green: 54, blue: 26,
        alpha: 45);
Because they know that "RGB" is a single coherent concept and alpha is less closely related. An automated formatter doesn't (and probably shouldn't) have that domain knowledge.

But the value proposition of automated formatting is not just "how nice is the resulting code to read". You have to look at the total value proposition of completely yielding formatting to a tool versus allowing human control over it. When it's completely automated:

1. You can run it on generated code that contains absolutely no whitespace and still get nice output.

2. Humans can do large-scale refactorings, format, and get output that is consistent with the existing state of the codebase without having to understand any local style preferences.

3. Humans never have to spend time deciding how to format. Further, they don't even have to spend time deciding if they should format.

4. When reading a random codebase, it is likely to be formatted in a style you are used to even if you have zero communication with that team. This is particularly important in open source.

5. The code looks familiar to you wherever you encounter it: IDEs, plain text editors, code review tools, blog posts, StackOverflow answers. As opposed to letting everyone pick their own style and relying on users to apply their preferred style locally, it's just always in a familiar style.

6. Like any automation, the tool doesn't make mistakes. Even very careful humans hand-formatting make more mistakes than they realize. (I know because I've looked at their code). Those mistakes can be distracting for readers.

7. It gets people out of the mindset of being nitpicky about style. It encourages them to stay focused on the structure and naming of their code, which is what really matters.

8. It eliminates style arguments in code reviews. Those take time and, worse, cause disharmony, for next to no benefit.

I think it's very worth excepting some small loss of overall formatting quality to get those in return.


Related:

The Hardest Program I've Ever Written – How a code formatter works (2015) - https://news.ycombinator.com/item?id=22706242 - March 2020 (125 comments)

The Hardest Program I've Ever Written (2015) - https://news.ycombinator.com/item?id=17271963 - June 2018 (76 comments)

The Hardest Program I've Ever Written (2015) - https://news.ycombinator.com/item?id=15063193 - Aug 2017 (48 comments)

The Hardest Program I've Ever Written - https://news.ycombinator.com/item?id=10195091 - Sept 2015 (76 comments)


As a programmer I prefer formatters that don't introduce those heuristic line-breaks based on line length.

I'm still hoping that Rust will eventually get such a formatter. Unfortunately the people responsible for rustfmt seem to have a strong preference for the "ignore line-breaks the user inserted" approach.


Zig's formatter has no notion of line length. If you want an argument list to be stacked vertically, you insert a trailing comma after the last argument, otherwise the formatter will put it all on a single line. I was a bit bothered by this at first but I came to really like it.


Agreed. Formatters are supposed to remove all cognitive burden related to formatting. But formatters like Black (for python) will do line-length based formatting which reintroduces the cognitive burden again ("oh crap, my variable names are too long, better shorten them so this line stops getting broken up"). I like gofmt better for this reason. It doesn't break up your lines based on some arbitrary line length.


I usually set the line length limit to somewhere around 80 so I can have several columns visible at once without wrapping or truncation.

So far my magic number is three columns: three code files, or one/two code files with a terminal and/or browser window thrown in the mix. Or any of these columns can be split into two vertically stacked boxes for a total of six things.

I’m also a (nonultrawide) single screen coder (after many forays into the multi screen world) which has undoubtedly guided my preference.


Ultimately code is a visual medium (for most users). It is a data format consumed primarily by human eyeballs, so there is no escaping the reality that things like identifier length, line length, wrapping, etc. matter.

Any other strategy is like trying to design a chair without thinking about butts. You may come up with some sort of elegant Bauhaus mathematically perfect work of art, but no one will want to sit in it.


This grinds me gears. Especially when you have multiple similar lines and one is a character longer and the formatter breaks that line.

It’s like, I could scan the code easier and understand it better without you doing that, thank you!

A similar thing is match statements — some arms using braces vs statements.

There’s also a bunch of heuristics in rustfmt that are complex to the point that I literally couldn’t format the code the way it does without building some sort of decision tree annotated with uneven column limits (e.g. “70% of the column limit”). If I can’t and therefore wouldn’t format the code the way the formatter does, there’s an issue.

Formatters are for consistency, I think, and sometimes they work against that.


It's not the maintainers - rustfmt has an official style guide it's not allowed to break without an RFC: https://github.com/rust-dev-tools/fmt-rfcs/blob/master/guide...


This is the second or third article on the difficulty of line breaking I've read on HN just this week. Why aren't there any good *exhaustive* tome on the art of text editors / line breaking / text shaping / text rendering / text on the GPU etc. I'd pay good money.


There’s not a single reference that I know of, at least not covering all aspects. An interesting place to start may be this PDF and it’s references: https://mirror.math.princeton.edu/pub/CTAN/info/memdesign/me...

TeX famously does line breaking in a perhaps decent way - but it and text shaping become more of an art than a definite list of rules to follow.


is this /s, in reference to latex/knuth and the precursor to yak shaving?


I'm not sure I understand why dynamic programming wouldn't work (and the author explicitly mentioned Knuth). Tex's main job is literally doing line breaks, which is the exact same problem being tackled here. I would expect a similar approach (progressively build a graph of the most promising breaking points) to be effective. Why wouldn't it be the case here?


As someone very familiar with the Knuth–Plass line-breaking algorithm (https://tex.stackexchange.com/a/423578/48), an important difference I see here is that for paragraphs (the domain of TeX), there is no "state" that needs to be preserved across lines: if you know that your paragraph is going to choose a certain break-point, then you can pretty much typeset the "before" and "after" parts independently, each optimally. (With one exception: there is a penalty for hyphens being on successive lines, so we need to track whether the previous line was hyphenated.) This is the "optimal substructures" property that makes it so amenable to dynamic programming.

With the code formatter, to format the part after a certain character, you need to keep track of the indentation depth of all the expressions that have not yet terminated at this point — because you presumably want parallel expressions to be formatted with the same indentation depth, for closing parentheses to match their corresponding opening parentheses, etc.

For example, in this example:

    experimental = document.querySelectorAll('link').any((link) =>
        link.attributes['rel'] == 'import' &&
            link.attributes['href'] == POLYMER_EXPERIMENTAL_HTML);
and, say (I'm making this up):

    experimental = 
        document.querySelectorAll('link').any(
            (link) => 
                link.attributes['rel'] == 'import' &&
                    link.attributes['href'] == POLYMER_EXPERIMENTAL_HTML);
— knowing that there's a break after the `&&` is not enough; you also need to know the indentation of the previous expressions, to decide how you're going to format the part after the `&&`.

This is what the author alludes to in the post:

> A line break changes the indentation of the remainder of the statement, which in turn affects which other line breaks are needed. Sorry, Knuth. No dynamic programming this time. […] For most of the time, the formatter did use dynamic programming and memoization. […] It worked fairly well, but was a nightmare to debug.

> It was highly recursive, and ensuring that the keys to the memoization table were precise enough to not cause bugs but not so precise that the cache lookups always fail was a very delicate balancing act. Over time, the amount of data needed to uniquely identify the state of a subproblem grew, including things like the entire expression nesting stack at a point in the line, and the memoization table performed worse and worse.

In TeX, paragraphs have each line of the same width (simple case) or can have a \parshape (in general), but these are "global" constraints that don't depend on what breaks you choose.


Not sure if saying you can't use dynamic programming is accurate though, you simply can't use a direction translation of Knuth's algorithm since this misses indentation.

If I recall correctly Knuth uses Dijkstra on a graph with nodes of 'line break at position x' and I don't see why you couldn't use a graph consisting of 'line break at position x indentation y' or something similar.


It's the "or something similar" that's the catch. For another example, search the post for `scriptLoadingTimeout` — to know how to indent the code after a break immediately after that position, you need to know the indentation of the `.timeout` before it, of the immediately preceding `.then(`, and of the `return` at the top — basically you need to know the indentation level of every parent in the expression tree. That means the graph's states are something like "line break at position x, with indentation of parent expression nodes being …, …, …", and then you have too many states as mentioned in the post. There's a combinatorial explosion of the state space. Using dynamic programming with this large state space is still possible, but approaches the running time of the brute-force algorithm. (I do wonder how extensively it was tried, though.)


> I would expect a similar approach (progressively build a graph of the most promising breaking points) to be effective. Why wouldn't it be the case here?

That's…how he does it.


There are some nice clarifications to the problems he ran into with his DP implementation with the little skulls at the bottom of the article.


Yes, strange, it looks like that's his solution plus some adhoc logic. At the same time he's more knowledgeable than I am so dunno.


That's something that I've been thinking about yesterday, while writing my code in PHPStorm. I thought how much easier those modern tools make programmer life, how intuitive they are and how hard it must have been to get to the current state of art. Thanks for that, creators!


I have a particular fondness for well-formatted html that I can read via 'view source' - in contrast to the div-overloaded soup that I usually encounter. Periodically, I toy with automated formatting for html until I remember that it's essentially impossible - two html sources can have a single space character difference and simultaneously produce the same output and different output, depending on an external CSS file. This kind of stuff is tricky.


I'm confused, don't the browser dev tools do exactly this? That seems better than sending pretty-printed HTML on the wire, which is a bunch of unnecessary bytes that users have to pay for.


The browser dev tools display something very different to the source of the page - it's a live view of the document, for a start.


That's addressed easily enough: disable JavaScript, so that the document can't change.


Unless I haven't discovered the method, it's also a LOT harder to search the DOM for a given string using the dev tools than it is via source.


In Chrome, you just Ctrl+F on the Elements tab and then start typing.


If I ever have to write a code formatter, it will strictly enforce one line per statement and disallow artificial line breaks. Devs who end up writing 5000-character function chains better have a wide monitor.


Whenever I read something like this I wonder that current languages (even the higher level ones) are poor at expressing higher-level concepts like that in a practical way and capturing that complexity (in an easily manageable) form

One of the hardest parts of programming is understanding what's happening from reading code. And if you abstract too much "the traditional way" then it is just even harder to understand.


I don't understand why people have this fetich for automatic formatters. If you really want this, you should be using old style FORTRAN or something similar. The good thing about modern languages is that you don't depend on the location of code in the page for it to work. If you start worrying too much about exact formatting, you throw away this big advantage. I really prefer code in the location where I put it, not where are machine thinks it is best.

And if you think that formatting is a problem to understand the code, let's get real: this is the smallest of the problems. There are tons of other things that make code complicated to read, like variable and function names, the particular style of your code, how you split it into classes and files, the algorithm you're using, and so many other, more important things. I can guarantee you that if a piece of code is well written, you can understand it independent of where you put braces or the number of spaces you're using.


I take it that you don't work on a large corporate code-base / don't have to code-review other people's code?

Auto-formatting (esp. when used as a pre-commit hook) means that changes people make to the style are ignored/reverted (and/or, that places where people introduce a different style in new code, are auto-formatted back into the existing style immediately, rather than that needing to be an additional commit later on.) Thus, no spurious diff lines from formatting. Thus, not having to wade through a bunch of "noise" diff-lines, to get to the "signal" of semantic changes at code-review time.

Also, having auto-formatting on both your main branch + development branches, makes merge/rebase conflicts less likely to happen. (Which basically boils down to "fewer noise diff-lines" again.)

In other words: auto-formatting makes code more machine-legible to syntax-blind parsers; which in turn allows tooling like diff(1) to be more helpful.

(Yes, we could just have language-syntax-aware semantic-level diff/merge/etc. tools. Not sure why nobody ever made these. I bet this is one of those things where Lisp users have had it for ages but using their own parallel world of abstractions that doesn't exist in C/POSIX.)


This has nothing to do with formatting. When you create a change to a code base you should be submitting only the lines that are new/changed. If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.


> If someone is submitting purely formatting changes, he/she's just wrong and you should reject that during review.

If you add a line between two existing lines, and then insert after it a new blank line to serve as a sort of "paragraph marker", is that a "pure formatting" change?

If you add a constant in a group of constants, whose name is longer than the existing ones, do you pad the spacing of the values of those constants so they line up with one-another?

For that matter, if you fully-qualify a previously-unqualified and potentially-ambiguous identifier, is that a "pure formatting" change? Some auto-formatter tools do this, after all.

These are things that people may or may not do in code-bases, that "fly under the radar" of even the most stringent of human code-reviewers, because they're so irrelevant to understanding the code. They're "fluff." But because of this, how people introduce that fluff is essentially random, and so the cause of a lot of diff noise. These are the things that auto-formatters can "lock down" to only happen a certain way.

But I think you're missing the forest for the trees, as I mostly wasn't talking about pure formatting changes. What I'm talking about is more like:

You add a formal parameter to a function. Before, the function's clause head was less than 80 characters. Now it's more than 80 characters. Do you break the formal parameter list onto the next line? If so, how far do you indent it? Do you split the formal-parameter list up so that each parameter is now on its own line? Etc.

Done by humans with no strict standard, these sort of one-off judgements made arbitrarily will add up to "syntax rot" — not something you observe with your eyes, but a sort of "potential energy" of un-made formatting changes, that means that any given semantic change by a sufficiently-motivated human might become the impetus for a manual reformatting during that semantic change, such that that reformatting will happen at a random time, inflating a patch where affecting that additional code wasn't strictly necessary. (If you ask the programmer why they did it, they'll say they needed to "clean up the code they were working on" so that they could understand it well enough to apply the fix.) Which is horrible for both code review and merge predictability.

On the other hand, an auto-formatting tool will apply that transformation exactly when it becomes necessary; and will pick some way of formatting the additional lines and stick to it. There's no "potential energy" there. At all times, the codebase is "at rest", with no chance of anyone introducing "arbitrary" (but actually left-over) formatting changes.

Human formatting is like a sequence of DML statements in an RDBMS. Auto-formatting is like a sequence of operations against a CRDT. Given a bunch of changes run in a random order, the output of human formatters will be arbitrary, while the output of auto-formatting will be deterministic. Which is what you want, if you're doing complex things involving e.g. long-maintained stable branches for 1.x that cherry-pick changes from 2.x.


Strongly disagree:

- Not having to format my code manually at all, just letting the formatter do it for me, is a significant productivity win. I write code as fast as I can, with the minimum number of key strokes, in a way that would normally be super ugly, and it comes out the same. I have my editor setup to auto-format the current file on save, so it’s just type a bit of code with zero formatting, cmd+s, then it’s instantly perfectly formatted

- For a codebase with 10s or 100s of devs working on it, uniform formatting does significantly help readability. Sure I can still read it if there’s dozens of different formatting styles going on, but I can read it faster if the formatting is always consistent

- Re: the above, yes you can keep consistent formatting without a code formatter, with a style guide that everyone learns, and that you enforce in code reviews. But that’s a waste of time both for on-boarding new devs, and a basically neverending waste of time during code reviews. Also a waste of time writing and maintaining the style guide itself

The first point helps me write faster, the second helps me read faster, and the third keeps code reviews and the like quicker.

Code formatters are such a clear, easy win, especially with large teams, that it’s hard for me to understand why anyone would opt out of them. It’s not a MASSIVE win, but IMO it clearly makes for a more productive development environment, and they’re generally dead simple to setup.


I have worked on teams that do automatic formatting and others that didn't. I have never seen any advantage of automatic formatting. In my experience, people who like to complain about simple things like where to put braces or where to break a line will move the goal posts and start to complain about particular parameters of the formatter, or try to change the formatter to something "more powerful". People who don't care about location of braces will continue working without problems, and everything will be the same as before, just with the added complexity.


The point is not that any one code formatting style is best, the point is that consistency in formatting across a codebase helps you read code faster. Our eyes and brains are good at picking up patterns - consistent patterns lets our brains parse code faster than if every file is written in a different style.

Furthermore, not worrying at all about indentation, spacing, brace placement, semicolons or not, etc. lets me write code faster, not just read it faster. Type it out with zero effort expended on formatting, save, editor auto-formats.

It’s not that any of this saves crazy amounts of time, but it does make all of code writing, code reading and code reviews slightly faster. When it’s so easy to setup, why not do it?

The only argument I can see against auto-formatters is that people like to put their own artistic touch on the code they write. I get that, but it wastes time, especially when everyone starts doing things in their own style.

I’ve been working professionally as a dev for 9 years, also on teams that use auto-formatters, and teams that don’t. I think they’re a small but clear productivity booster. The only time I’d consider not using one is in languages where the formatter itself is super slow. But fast ones, where you can setup nearly instantaneous “format on save” (go fmt, prettier, etc.), no brainer.


> The good thing about modern languages is that you don't depend on the location of code in the page for it to work. If you start worrying too much about exact formatting, you throw away this big advantage.

Counterpoint: When using a formatter, I stop worrying about formatting. It's a job for a computer, done by a computer. Humans are bad at consistency and discipline, computers are great at it. I want to concentrate on the things that matter, and formatting isn't one of those.

Especially in larger teams, consistent formatting is just nice. No conflicting styles in the same file, and more meaningful diffs.


If you really want to stop worrying about code formatting, just stop doing it. It is not really that important. I have never spent any time worrying about it, and I don't see why people would be upset about formatting.

Moreover, using an automatic formatter will not fix it, because, guess what, there is no universal code formatter. All of them have different results and a long list of parameters. Determining the best way to use one will create more work for you as you manage your team, and will inevitably add a new step to your already complex building process. Just stop worrying and use that time in more productive ways.


> Determining the best way to use one will create more work for you as you manage your team, and will inevitably add a new step to your already complex building process.

I don’t know. I write JavaScript at $DAY_JOB and setting up Prettier on our repos took all of ~30 minutes, with an additional ~15 to determine which options to use. (There aren’t many because Prettier is fairly opinionated.) I have seen far more time wasted quibbling about code styling in code reviews.


In addition to what @derefr said, in order to not want automatic formatting, first you have to get to the place where zero people in your team/company care about formatting & whitespace at all. Disagreements over whitespace consume people time, and those disagreements go away when automated formatting is used. This is the strongest reason in my experience to use automatic formatting: to eliminate time spent talking about formatting.

Auto-formatting tools in editors exist, and they're very common, and they're not always configured properly, so people change formatting on accident. Sometimes formatting changes can cause code reviews to take more time than necessary. Having tabs in code can cause actual problems, for example, since tabs aren't the same size everywhere.

This is not just a code understanding problem, and shouldn't be written off as trivial, IMO.


I've been thinking of working on an automatic formatter for one particular programming language in order to easily be able to guarantee consistency of the documentation examples for it. (I get occasion bug reports about stylistic inconsistency or inconsistent spacing in them every so often)


I like automatic formatters (if they’re a deterministic function of AST to text) because I think of what I’m writing as a syntax tree, and the fact that it’s stored as text as a historical accident.

I just want to write the tokens without ever thinking about where they go on the page, periodically save and let my formatter deal with it.


If you truly think that you have never worked on a codebase with a team size > 3.


Auto Format is not for the machine, it is for other humans who work with you.


The day job interviews for programmers ask "write me a language formatter, you have 3 hours" I'll probably end up in jail. Those things are way beyond my skillset and I'm glad smarter people than me exist. If you're one of those people: thank you. I love you.


Why jail? Cause you'll lose your mind and do harmful things? Asking for a friend


The result will be so bad, it will be considered a violation of the Geneva Convention.


I’ve been the primary maintainer for a vim plug-in for a number of years. Dealing with indentation expectations of a modern, complex programming language without an AST parser (you need to be able to deal with code that doesn’t compile) is one of the hardest problems I’ve had to work on. Dealing with string and comment detection, dealing with the constant influx of new features, keeping it performant and maintainable in a dedicated scripting language with bespoke debugging tools. The best approach I’ve come up with is to be ruthless about testing and use strictly TDD for everything

Take a look at how complex the code is for something like vim-ruby to get a feel for what I’m talking about


Long statements is one of the reason I dislike “fluent interfaces”. To me long statements feel like a problem of bad language design. And a super smart formatter feels like a crutch when what you really want is a leg.


Sheesh. That is hard. And that does NOT pales in comparison to me polishing the “tiny” Bash regex for removal of inline comment (as denoted by hash, semicolon or double-slash) in the INI-format (version 1.4 2009) file … while … and while permitting those same inline comment characters in quoted string to be allowed in (along with its sub-sequential string up to its ending pair of matching single/double quote symbol.

I have a working regex (passed by many regex online testers) but in bash yet, NO!

To graderjs of HN, the Author of Dart code for matter, you got mad respect from me.


Related for JavaScript, Ruby, HTML and many more https://prettier.io

And the creator of prettier/plugin-ruby worked on a pure-Ruby implementation https://github.com/ruby-syntax-tree/syntax_tree


It's interesting that the article specifically mentions the go formatter, but fails to notice that the go formatter sidesteps this problem entirely by not setting a line length constraint: https://news.ycombinator.com/item?id=16434566


> If every statement fit within the column limit of the page, yup. It’s a piece of cake. (I think that’s what gofmt does.) But our formatter also keeps your code within the line length limit.


Maybe a code formatter should be just brutally simple and predictable. Fearing the look of long, complicated statements, coders will shorten their statements and just do one thing per statement.


It also encourages shorter and often less descriptive names. That's not necessarily a good thing.


This really depends a lot on the language syntax, naming of built ins, and common idioms. Like Java is notorious for its long lines because it’s so verbose and long naming is the common convention. Lisps are notoriously far off to the other extreme. There’s quite a lot of room between those extremes, occupied by languages like Python and Ruby. And then there’s languages like JavaScript and especially TypeScript which span most of the range depending on preference.


Does it find a true optimum, or just some approximation?


As long as it doesn't hit a built-in hard limit for search space exploration (which is in practice only encountered on pathological generated code), it will find the optimally scored set of line breaks.


Approximation


Bravo! Well written and informative! And as someone who's obsessed with Dart at the moment timely! Thanks!


Shouldn't this be marked 2015?


So wise to share the failures and pitfalls, along with the successes.


language formatter/server with semantic analysis are the hardest thing ever


(2015)


So many formatters popping up. If the old-timers could do without them, I wonder about the usefulness of them.

The best code formatter is you.


So many hours have been saved thanks to formatters. Not only writing it, but also countless hours in PR-reviews without senseless nitpicking.

One of the best trends recently.


> Not only writing it, but also countless hours in PR-reviews without senseless nitpicking.

I've never understood this point. Programmers will always find something they can nitpick about. Ultimately you want a culture which focuses on the critical parts: Correctness of code, good test coverage, big-picture architecture which has long-term impact. Bringing an auto-formatter into the picture may reduce some senseless nitpicking, but you haven't actually done anything to solve the real culture problem. If your team was getting blocked because people were arguing about formatting you have bigger problems that won't be magically solved by adding an auto-formatter.

To give some examples:

A while back I worked with one person on a project where we had both Prettier and super strict ESLint, and I would still get PRs rejected because they wanted the code to be slightly refactored in a way which was entirely subjective and had no impact of the correctness (e.g. "flip this negation") .

And right now I'm working on a team where we explicitly tag some PR comments with "nitpick". This will not block the PR from getting merged, but instead it's a way of saying "I prefer it this way, but it's not that important in the bigger scheme of things". This is also a signal that it's not something that we want to start a bigger discussion around.

(We use auto-formatters and linters as they are very useful.)


It’s not that formatters are essential but they are extremely convenient. They don’t just save you from formatting your own code, they also:

* Prevent arguments on which formatting convention to adopt,

* Save a peer reviewer from shallow comments if formatting conventions are broken,

* Prevent fill-in white space commit from filling in the git history, and

* Decrease the risk of senior developers imposing weird styles on the code base.

Formatters might help you write better code by freeing you from worrying about one aspect of coding, but—much more importantly—they help create and maintain a better culture around your code.


They also:

1. Make programming more boring

2. Prevents me organizing related thoughts on single lines

3. Prevents me doing meaningful and more readable indentation in specific contexts (such as align on equal sign etc.)

I don't like them, nor I like any of the style checkers. The equivalent is aS if somebody wrote a book, and you give it to GPT-3 to make it more readable for entire world. Fascinating BS.


> Make programming more boring

This is a terrible argument for anything to do with programming/code. It is pure opinion and preference and therefore is not falsifiable.

If you’re on a team of one, go wild. But if you’re on an actual team trying to get things done, please don’t bore the other people to death with the “interminably soul crushing debates over code formatting” as the article aptly puts it.

The team wants to see exciting results, not “exciting” code. Code is a means, not an end.

And especially if you’re on the job, you are not being compensated with excitement, but money. Go seek excitement in your personal time.

Formatters are not comparable to GPT because the code semantics have not changed, only the form. You just don’t want to retrain yourself to read code that isn’t written exactly to your liking. That’s laziness.


Most code formatters have pragmas to allow you to break out of the autoformatter, for when you really do need to override (your case 3).


I can see why it is preferred for teams, if you don't work in teams you can decide for yourself of course.


Formatting is NOT the purely stylistic choice many code formatter authors make it out to be, though.

It can greatly affect readability, lead to/prevent unnecessary merge conflicts, and aid in/stand in the way of using nice VCS features (blame, revert, bisect, cherry-pick, …).

Many automatic code formatters ignore and do not optimize for these metrics.


I’ve never seen a formatter cause merge or blame issues, but that could be because I always have them run on a precommit hook and even then always squash all commits on a PR so there is never a commit in mainline history solely for code format. I would not admit a PR solely for formatting, either.

Git history trumps code style IMO, I would not e.g. add a formatter to a legacy codebase and then reformat everything, or do so after changing a rule. Only diffs get formatted.


Old-timers did without memory safety too, which is why I'm switching our company's stack over to good ol' C


Some of the older programming languages had memory safety. C is not the only old programming language.


There's two ways to look at how people of the past compare to people of today:

1. They didn't have X, so obviously we don't need X now.

2. There were the ones who created X, so clearly they felt the absence of X was a problem to be solved.

I'm inclined to believe 2 comes into play more often than 1. The present was created by those living in the past.


Just because they did without them doesn’t mean they wouldn’t have preferred to have had them, given the opportunity. And I really doubt you’ve surveyed them all to make such an authoritative call on this.


Agree. If I didn't want to have opinions, I'd join a cult.


Do you also wonder about the usefulness of git?


Git is how we store all the white space changes as the formatting wars rage.

Apparently some actual code changes are in there but we’ve never found them.


Sometimes, yes.

But I never worked in a big professional team before, where git's features shine.

And git != code formatter.


Preaching to the choir here I suspect, but if other lone developers are reading this:

Git's features shine just as brightly, IMO, if you work for yourself.

Granted a good number of them are often not very relevant to a lone developer, but if you work for or by yourself on any projects over the long term, you should build git (or some other distributed source control, but probably git) into the way you work.

Not to excess, by any means. I use a very small subset of git's features and in practice many of my simpler projects don't even use branches, because it's not in the nature of the changes I am making to those projects or the time management I do.

But for example you should consider git an essential step between dev and live -- using it to deploy -- and you should look at how you could use it to facilitate staging and testing.

Combined with a changelog and relentless use of comments and notes, git helps "structured forgetting", which as a freelancer is pretty crucial; sometimes you work frantically on a thing for a month, get paid and then it comes back to you years later.

> And git != code formatter.

No, obviously, but the needs of the former are supported by the benefits of the latter.

That said, I use one for golang but not, in general, for PHP. I should find a code-formatter I can bend to my will for PHP, but after 17 years of increasingly complex lone PHP development, I know what I need from my own formatting requirements in order to manage projects.


In a team is also where code formatters shine, IMO.

The biggest advantages (to me at least) are that they almost entirely eliminate formatting from code review, and the consistent style makes it easy to read and edit code written by different people.


Git is insanely useful regardless of whether you're in a team or not. Having a history of your changes, being able to define an atomic change, branches, etc are all very useful even when working solo.


I have never regretted issuing a git init command, but I have regretted not doing that.

Granted when I am developing for myself, most/all commits are just going to master but even so.


But you are not me and here we are.


Will you also forgo antibiotics the next time you have a raging infection, or do you only appeal to antiquity _sometimes_?


Keep your straws, strawman. I don't want them, be they short or long.


If you think this is a strawman, then you’re even more deluded than I thought. Best of luck, you’ll need it.


Heuristics? That sounds like a job for machine learning, and I'm not being frivolous. I think that it is doable, and when it gets it wrong, the consequences are almost nil. It would at least make a decent graduation project.

Let's not think of the spin-off: FaaS.


Part of the complexity is if I format my code and commit it, and then you checkout and make a change, we don't want your formatter to have different opinions on how to format my code. For this you need (partial) stability at least across minor versions, which is harder with a less-explainable algorithm.


There is research into using ML for automated formatting. Personally, I'm not a fan. The heuristics are relatively simple and when hand-authored can be explained. Throwing ML at it discards explainability and risks really weird formatting decisions on edge cases for relatively little upside.

My experience is that people prefer formatting that is:

1. Unsurprising.

2. Nice looking.

3. Simple.

In roughly that order. Using ML might increase 2 but at the expense of 1 and certainly 3.


Why replace a domain-specific heuristic with domain-agnostic machine learning? And where do you get high quality training data?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: