> Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn’t, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can’t do anything about it and your message doesn’t arrive.
The argument is disqualified at this point. The whole world is a leaky abstraction because <freak meteor hit could happen>. At this point your concept is all-encompassing and in turn useless.
There are assumptions: this computation will finish eventually [assuming that no one unplugs the computer itself]. This does not make things leaky.
There are leaky abstractions I guess but not all are. A garbage collector that can cause memory errors would be leaky. I don’t know anything about garbage colletors but in my experience they don’t.
Then someone says that a garbage collector is leaky because of performance concerns (throughput or latency). That’s not a leak: that’s part of the abstracting away part—some concerns are abstracted away. To abstract away means to make it something that you can’t fudge or change. To say that “this is implementation-defined”. An abstract list is an abstraction in the sense that it has some behavior. And also in the sense that it doesn’t say how those behaviors are implemented. That’s both a freedom and a lurking problem (sometimes). Big reallocation because of amortized push? Well you abstracted that away so can you complain about it? Maybe your next step is to move beyond the abstraction and into the more concrete.
What are abstractions without something to abstract away? They are impossible. You have to have the freedom to leave some things blank.
So what Spolsky is effectively saying is that abstractions are abstractions. That looks more like a rhetorical device than a new argument. (Taxes are theft?)
> There are leaky abstractions I guess but not all are. A garbage collector that can cause memory errors would be leaky. I don’t know anything about garbage collectors but in my experience they don’t.
Garbage collectors are a rich source of abstraction leaks, depending on what you do with the runtime. If you color within the lines, no surprises, the garbage collector will work. Unless it has a bug, and hundreds of GC bugs, if not thousands, have shipped over the decades; but while a bug is an abstraction leak, it's not a very interesting one.
But go ahead and use the FFI and things aren't so rosy. Usually the GC can cooperate with allocated memory from the other side of the FFI, but this requires care and attention to detail, or you get memory bugs, and just like that, you're manually managing memory in a garbage collected language, and you can segfault on a use-after-free just like a Real Programmer. It's also quite plausible to write a program in a GC language which leaks memory, by accidentally retaining a reference to something which you thought you'd deleted the last reference to. Whether or not you consider this an abstraction leak depends on how you think of the GC abstraction: if you take the high-level approach that "a GC means you don't have to manage memory" (this is frequently touted as the benefit of garbage collection), sooner or later a space leak is going to bite you.
Then there are finalizers. If there's one thing which really punctures a hole in the GC abstraction, it's finalizers.
> But go ahead and use the FFI and things aren't so rosy. Usually the GC can cooperate with allocated memory from the other side of the FFI, but this requires care and attention to detail, or you get memory bugs, and just like that, you're manually managing memory in a garbage collected language, and you can segfault on a use-after-free just like a Real Programmer.
Now you’ve stepped beyond the walled gardens of the managed memory. How is that an abstraction leak?
> It's also quite plausible to write a program in a GC language which leaks memory, by accidentally retaining a reference to something which you thought you'd deleted the last reference to.
That the user just thought they had gotten rid of? If the memory is technically reachable then that doesn’t sound like its fault. I’m reminded of the recent Rust Vec thread on how the so-called space leak of reusing allocated memory lead to unreasonable memory consumption. But to my recollection that wasn’t a leak in the sense of unreachable-but-not-freed. I do agree however (with those that made this point) that the Vec behavior was too clever. Which goes to show that Vec should probably just stick to the front-page abstraction advertised: will amortize allocations, can shrink to fit if you tell it to, nothing much more fancy beyond that.
(The memory leak topic seems very fuzzy in general.)
I tend to agree. "All nontrivial abstractions are leaky" reminds me of other slightly-too-cute rules, such as "full rewrites are a mistake" and "never parse JSON manually".
I wouldn't call TCP leaky because it can't deliver data across a broken network cable, for example. It's abstracting away certain unreliable features of the network, like out of order delivery of packets. It's not abstracting away the fact that networking requires a network.
I suppose it should be considered where the abstraction actually exists. If the abstraction exists in logic or mathematics (ie. a triangle is a 3 sided polygon) it probably doesn't make much sense to consider the ramifications that thought occurs in a physical brain that can fail. On the other hand if the abstraction is physical (ie, hardware), then the fact that it is bound by physical law is obviously implicit. Software encompasses both physical and logical abstractions, so you need to pick a lens or perspective in order to actually view its abstractions.
I unflagged you by vouching for you. I found your post difficult to understand and couldn't figure out what you are trying to say, but I agree it was not deserving of a flag.
Your criticism that a broken wire is outside the scope of the TCP protocol was clear and valid.
The subsequent paragraphs about garage collection are tough to follow. You have multiple parenthetical remarks, a quotation which I think is used as emphasis, italics used as emphasis, compound sentence fragments joined by an em-dash and a colon in the same sentence, rhetorical questions which presumably have obvious answers but not obvious to me, terms that aren't clear (e.g. what is an "amortized push"?), and concepts that don't seem to be related to GC (e.g. an "abstract list" can be implemented without GC, so why is that included in that paragraph?).
I've read those paragraphs 4-5 times now, and I don't think I understand what you are trying to say.
amortized push: a fundamental data structure, the vector, is a growable array of elements. The standard name for the function that appends an element at the end of the array is "push". You push to the array.
A vector points to an allocated portion of memory. When full, a new one; usually double the size; is allocated, everything copied over, and the original allocation is discarded.
Therefore the cost of memory allocation and copy is amortized over many pushs.
Yes, most of the article is dedicated to describe the “leak”, but there was no call to abolish abstractions. Just the insight that one needs to understand implementation of those.
What’s the alternative? That at least N projects cooperate and agree on a common design before they do the implementation? (Then maybe someone can complain about design-by-committee.)
I use Artemis, which was originally written for Mercurial but also supports Git. It stores issues inside a repo, so it doesn't care about where it's hosted and works completely offline without needing a Web browser. Issues are stored in Maildir format, which is widely supported standard that can be processed using off-the-shelf tools. For example, I write and comment on Artemis issues using Emacs message-mode (which was designed for Email), and I render issues to HTML using MHonArc (which was designed for mailing list archives).
I'm not claiming these tools are the best, or anything. Just that existing standards can work very well, and are a good foundation to build up UIs or whatever; rather than attempting to design something perfect from scratch.
> What’s the alternative? That at least N projects cooperate and agree on a common design before they do the implementation?
That would be ideal, yes. You should solicit comments from the greater community before setting the format in stone. But the very minimum would be to build on existing attempts at issues-in-git like [0] instead of reinventing the wheel unless you have a very very very good reason.
Yes! That's exactly what I would like to see - come together as a working group, create a PR on git itself, and implement standard support for issues, PRs, discussions, projects, votings, project websites, what-have-you. The community will take it from there.
The alternative to that would be the git project itself coming up with an implementation. They have reasonable experience working with the Kernel, and the creation of git itself seems to have worked reasonably well -- although I'm not sure I would want to use something Linus considers ergonomic :)
Ok. That could work if you found a group of people who are interested in such an all-in-one package. Gitlab is apparently working on a sort of cross-forge protocol (probably motivated by connecting Gitlab instances) and it seems that some Gitlab employees are working full time on the Git project. So those are candidates. You probably need a group which both have recognition within the project and are active enough to drive such a project forward without it fizzling out.
Without such a group you might just send a few emails to mailing list, get shrugs about how that plan “looks good” with little real interest, and then have to discuss this big plan in every future patch series that incrementally builds such a thing into Git itself. Mostly why such code should be incorporated and how it will pay off when it’s all ready.
The Git tool itself—and the project by extension—is per now very unopinioated about whole-project solutions like this. The workflow that they themselves use is very loosely coupled and pretty much needs bespoke scripting by individual contributors, which I guess is telling in itself.
IMO you don’t need anything more than commits. The by-hand is writing WIP in the commit message. If that is an inconvenience then I’ll make a commit alias.
The index + commits is flexible enough. The stash concept is not needed for me.
I think that you (like me) prefer to use commits in places where many others prefer to stash. Personally I think the stash command is too primitive to bother with for all things except maybe an push+pop that happens within half a minute. Anything more and I risk forgetting about it. A WIP commit however is just there, on the branch, for me to deal with however I want later.
I don’t want to fiddle with dropping the stash one too many times, forgetting what they were about and so on. Merge conflicts because I didn’t pop before resuming..
Amendment: I just found out that I do have at least one use for git-stash, which is when I am stuck in the middle of a rebase somewhere and made an edit intended for another point (commit).
You don’t have to interpret it as you-always-do-this. Just sometimes you want a worktree (or clone): maybe the bug fix needs to be applied to an old, old release which would invalidate all the indexes in your project. So then you can make a worktree so that your main worktree is left in peace.
But is it overkill for some fast translation fixes? Yeah, probably:)
“Not in a position to commit” in Git makes as much sense as “not in a position to save the file” (in your editor) in 2024. Git is in a sense a primitive tool: only about snapshotting, not about high-falutin things like “passes the tests”, “is okay with Bob”, and so on. It’s just about tracking state.
(Again this is what git-reset(1) and friends are for)
> Git is in a sense a primitive tool: only about snapshotting, not about high-falutin things like “passes the tests”, “is okay with Bob”, and so on. It’s just about tracking state.
Git might be, but how many people want to use git is to provide useful commit histories, both for any reviewers at the time and any code spelunkers 10 years into the future. Having a tool that tries to make that easier isn't a bad idea.
My comment was totally unclear. By primitive I mean that it scales from very primitive operations (just snapshot) to supporting high-level workflows (refactoring, testing, verifying).
And by primitive I mean that I want to be able to commit whenever I feel like I want a snapshot. For any reason. Not hindered by concepts like does-it-build. That’s the low level. Then at the higher level are things like “public history” and “tested history”. That’s facilitated by the low/mid-level history rewriting tools.
Some people I know use Intellij’s “shelve” feature or whatever it is called. Interestingly it does provide some features that Git does not seem to have—and overlaps with GitButler—but it’s own bespoke thing, not integrated with Git.
And using these extra concepts on top of (or under?) Git doesn’t make sense for my workflow. Because the VCS is already there. So I don’t need to think about if I’m ready-to-commit—just make a WIP or a TEST commit, maybe discard later or maybe rewrite them.
For me, Git covers everything from snapshotting some private notes I have on the work that I’m currently doing to making a nice:
> useful commit [history], both for any reviewers at the time and any code spelunkers 10 years into the future.
> And using these extra concepts on top of (or under?) Git doesn’t make sense for my workflow
I think this is the key comment - that might be true, but I don't see a real criticism of making tools for not-avgcorrection to facilitate their workflows better. I agree things can be done in Git. I disagree that they must be done in Git.
> The thing that annoys me about worktrees is that you can't have two checked out to the same commit at the same time
Checking out the same commit is fine. You just can’t checkout the same branch.
This is not a problem for me because I either use detached head (if say I’m going to build from some commit) or restrict branches to certain worktrees naturally (if I am backporting to `old-version` on a worktree then... it’s not like that backport branch is ever going to be relevant in my main worktree where I do most of my work).
The first big thread on GitButler (recent) left me scratching my head. All the material with jargon like “virtual branches” just left me confused. It was like jumping into a how-to on why choose a Ferrari without first understanding what a car is. This clears things up nicely.
I mostly use worktrees for very separate things. Like: long-running build, way old or way new versions of the app. Then it doesn’t make sense to mix-and-match virtual branches. So when I want to build the app for deployment I don’t want to worry about whatever other changes getting in the way. Git worktrees doesn’t solve the same problem as what GitButler does. Worktrees is a streamlined way of the manual separate-clone workflow for the same repository. (Technically they are all distinct repositories once you clone them but ya know.)
I do have use for separate-from-branch files. Like notes to myself and test scripts that aren’t going into the branch. Crucially these files have nothing to do with the main work: the files themselves are not involved, so there can never be merge conflicts.
This GitButler workflow makes sense for things that (1) won’t cause merge conflicts and (2) which won’t step on each others’ toes. The example about a translation and code change is nice. Doing a translation at the same time as a code change is not likely to “break the build”.
Maybe there could be some utility for a decently general Graph type that can be used for high-level testing and verification. Maybe you could implement your efficient graph per-problem and then validate it with a more high-level and declarative type. You would have to implement some isomorphism between the types though.
The argument is disqualified at this point. The whole world is a leaky abstraction because <freak meteor hit could happen>. At this point your concept is all-encompassing and in turn useless.
There are assumptions: this computation will finish eventually [assuming that no one unplugs the computer itself]. This does not make things leaky.
There are leaky abstractions I guess but not all are. A garbage collector that can cause memory errors would be leaky. I don’t know anything about garbage colletors but in my experience they don’t.
Then someone says that a garbage collector is leaky because of performance concerns (throughput or latency). That’s not a leak: that’s part of the abstracting away part—some concerns are abstracted away. To abstract away means to make it something that you can’t fudge or change. To say that “this is implementation-defined”. An abstract list is an abstraction in the sense that it has some behavior. And also in the sense that it doesn’t say how those behaviors are implemented. That’s both a freedom and a lurking problem (sometimes). Big reallocation because of amortized push? Well you abstracted that away so can you complain about it? Maybe your next step is to move beyond the abstraction and into the more concrete.
What are abstractions without something to abstract away? They are impossible. You have to have the freedom to leave some things blank.
So what Spolsky is effectively saying is that abstractions are abstractions. That looks more like a rhetorical device than a new argument. (Taxes are theft?)
EDIT: Flagged for an opinion? Very well.