More

cobolexpert · 2026-02-18T00:35:44 1771374944

I like this tree visualization! The background with little squares is making the text difficult to read, though.

Sajarin · 2026-02-18T20:32:54 1771446774

Thanks for the feedback friend, updated to make it (hopefully) a little easier to read!

cobolexpert · 2026-02-17T11:17:45 1771327065

Had a funny experience with this some weeks ago. I started developing a small side project and after a week I wondered if this existed already. To my surprise, someone had already built something relatively similar _with the exact same name_ (though I had chosen mine as a placeholder, still funny though) only 2 weeks before, and posted it in Show HN.

I took a look at the project and it was a 100k+ LoC vibe-coded repository. The project itself looked good, but it seemed quite excessive in terms of what it was solving. It made me think, I wonder if this exists because it is explicitly needed, or simply because it is so easy for it to exist?

cobolexpert · 2026-02-16T09:13:35 1771233215

Changed 50 meters to 43 meters with Opus 4.6:

“Walk. 43 meters is basically crossing a parking lot. ”

sashank_1509 · 2026-02-17T01:56:56 1771293416

lol, are AI companies patching this answer in real time. I thought it took months long effort for a training run. How would they make changes in such a short period?

OneManyNone · 2026-02-17T15:00:56 1771340456

The companies aren’t changing anything. LLM outputs are just more random than people realize. Run the same prompt 10 times if you really want to know how well they can answer.

cobolexpert · 2026-02-12T09:48:37 1770889717

Same here, I find most of these skills/prompts a bit redundant. Some people argue that in including these in the conversation, one is doing latent space management of sorts and bringing the model closer to where one wants it to be.

I wonder what will happen with new LLMs that contain all of these in their training data.

cobolexpert · 2026-02-10T23:10:44 1770765044

Yep exactly, a Perl script

cobolexpert · 2026-02-10T16:36:35 1770741395

Relevant comment from some months ago [1].

Personally I find that sticking to distros backed by companies or very large communities is just easier in the long term (Debian/Ubuntu/Fedora/Arch).

[1] https://news.ycombinator.com/item?id=46092225

jajuuka · 2026-02-10T16:39:39 1770741579

On desktops and servers yeah. Bazzite was a bit of a special case as it was catered to handheld devices. So it did have that going for it. A one stop install that just supported everything on these devices from the start.

opan · 2026-02-10T16:46:35 1770741995

I've been thinking we could eliminate a lot of niche specialized distros by replacing them with system configs for Guix System or NixOS. Maybe if you got Ansible involved it could work for Debian and Arch also. Set your default packages, custom kernel, whatever else in there. Everything needing a big brand, name, logo, website, and so on seems a bit excessive at times.

plagiarist · 2026-02-10T17:50:44 1770745844

Bazzite is sortof in that category, though. Fedora atomic is a podman container image, and Bazzite is using that as FROM in their Containerfile. It's niche and specialized only to the extent that they're providing gaming specific setup (like Nvidia drivers). It's mostly a Fedora system.

https://github.com/ublue-os/bazzite/blob/main/Containerfile

shantara · 2026-02-10T17:43:02 1770745382

Now it’s your responsibility to explain what any of these words mean to an average user who just wants to play their Steam games. Like it or not, brands have power. It’s been hard enough to convince people already willing to try Linux gaming to use one of the dedicated gaming distros, instead of waiting for when SteamOS is going to support their hardware.

cobolexpert · 2026-02-10T17:03:17 1770742997

Maybe SteamOS will help with this!

ycombinatrix · 2026-02-10T20:19:36 1770754776

What do you mean? SteamOS is backed by a large company.

cobolexpert · 2026-02-10T21:42:40 1770759760

As it slowly starts working on another platforms, it can fill in Bazzite's role (a bit ironic I guess, given Bazzite is inspired on SteamOS)

ycombinatrix · 2026-02-11T00:39:30 1770770370

The fix for being held captive by a large company is not to hand yourself over to a different large company.

RandomGerm4n · 2026-02-11T08:46:54 1770799614

Using Fedora Kinoite/Silverblue is not really an option if you are using an Nvidia GPU. With Bazzite, the driver is pre-installed and also signed directly with a Secure Boot Key that you can import when installing Bazzite. With normal Fedora Atomic, you have to install and sign the driver manually, and with some updates, the whole thing breaks again, so you have to fiddle around with it. In addition, Fedora Flatpak Remote has been removed, which is a “noobtrap” in normal Fedora Atomic. This allows you to install broken versions of browsers where the codecs are missing and videos don't work. In addition, Distrobox functions better than Toolbox, and in general, Bazzite's defaults are much more geared towards an immutable system. Silberblue/Kinoite's defaults are just like normal Fedora, and you have to layer dozens of things to achieve the same thing, whereas Bazzite is completely designed for a container workflow.

Even if you ignore any gaming optimizations, etc., this alone makes it a significantly better option than the official Fedora Atomic images.

cobolexpert · 2026-02-04T16:05:46 1770221146

Dumb question: is the quadratic time complexity for training, inference, or both?

dave_universetf · 2026-02-04T17:31:27 1770226287

Both, with caveats. The attention computation is fundamentally quadratic: for every token in the sequence, you're doing a computation that has to compute over every other token in the sequence. So it's O(N) per token, O(N^2) for the whole sequence.

The big mitigation for this is that in causal transformers (i.e. all the chatbot type applications, where each token is only allowed to see tokens before it), you're running inference repeatedly on the same prefix in order to grow it by one token at a time. So if you cache the computations for tokens 0..N-1, on each inference pass you only have to compute O(N) for the newly added token at the end of the sequence.

That's why caching (and caching charges) appear so prominently everywhere in the pricing of inference.

In practice, caching is most beneficial at inference time, because you typically have relatively long conversations that start with the same cacheable prefix (the system prompt). At training time the same optimization can apply, but you're typically not pushing the same prefixes through the model repeatedly so you end up paying the quadratic cost more often.

The quadratic cost of attention is the fundamental compute bottleneck for transformer architectures, which is why there's research like this trying to find shortcuts in computing attention, as well as research into completely new primitives to replace attention (e.g. SSM, which is O(N) on a cold cache and O(1) on a warm cache).

omneity · 2026-02-04T16:07:42 1770221262

Attention is calculated during the forward pass of the model, which happens in both inference (forward only) and training (forward & backward).

SubiculumCode · 2026-02-04T16:41:44 1770223304

Dumb question: Can inference be done in a reverse pass? Outputs predicting inputs?

dave_universetf · 2026-02-04T18:30:48 1770229848

Strictly speaking: no. The "forward pass" terminology does not imply that there exists a "reverse pass" that does the same kind of computation. Rather, it's describing two different kinds of computation, and the direction they occur in.

The forward pass is propagating from inputs to outputs, computing the thing the model was trained for. The reverse/backwards pass is propagating from outputs back to inputs, but it's calculating the gradients of parameters for training (rougly: how much changing each parameter in isolation affects the output, and whether it makes the output closer to the desired training output). The result of the "reverse pass" isn't a set of inputs, but a set of annotations on the model's parameters that guide their adjustment.

The computations of the forward pass are not trivially reversible (e.g. they include additions, which destroys information about the operand values). As a sibling thread points out, you can still probabilistically explore what inputs _could_ produce a given output, and get some information back that way, but it's a lossy process.

And of course, you could train a "reverse" model, one that predicts the prefix of a sequence given a suffix (trivially: it's the same suffix prediction problem, but you train it on reversed sequences). But that would be a separate model trained from scratch on that task, and in that model the prefix prediction would be its forward pass.

direwolf20 · 2026-02-04T23:54:55 1770249295

I do want to see ChatGPT running upwards on my screen now, predicting earlier and earlier words in a futile attempt to explain a nonsense conclusion. We could call it ChatJeopardy.

gpm · 2026-02-04T17:02:21 1770224541

Not as trivially as the forwards direction, unsurprisingly information is lost, but better than you might expect. See for example https://arxiv.org/pdf/2405.15012

root_axis · 2026-02-04T16:47:40 1770223660

Sounds like a great premise for a sci-fi short story.

anu7df · 2026-02-04T17:14:49 1770225289

Sci-fi ? You mean historical fiction!

cobolexpert · 2026-02-02T18:40:20 1770057620

In the end this and all other 89372304 AI projects are just OpenAPI/Anthropic API wrappers, but at least one has 1st party support which maybe gives it a slight advantage?

cobolexpert · 2026-01-29T12:27:09 1769689629

Copilot wanting to name everything Copilot is a funny thought, for some reason.

cobolexpert · 2026-01-26T21:36:11 1769463371

I was also thinking this some days ago. The scaffolding that static languages provide is a good fit for LLMs in general.

Interestingly, since we are talking about Go specifically, I never found that I was spending too much typing... types. Obviously more than with a Python script, but never at a level where I would consider it a problem. And now with newer Python projects using type annotations, the difference got smaller.

zahlman · 2026-01-26T21:38:17 1769463497

> And now with newer Python projects using type annotations, the difference got smaller.

Just FWIW, you don't actually have to put type annotations in your own code in order to use annotated libraries.

cobolexpert · 2026-01-27T00:41:35 1769474495

Indeed, but nowadays it’s common to add the annotations to claw back a bit of more powerful code linting.