Had a funny experience with this some weeks ago. I started developing a small side project and after a week I wondered if this existed already. To my surprise, someone had already built something relatively similar _with the exact same name_ (though I had chosen mine as a placeholder, still funny though) only 2 weeks before, and posted it in Show HN.
I took a look at the project and it was a 100k+ LoC vibe-coded repository. The project itself looked good, but it seemed quite excessive in terms of what it was solving. It made me think, I wonder if this exists because it is explicitly needed, or simply because it is so easy for it to exist?
lol, are AI companies patching this answer in real time. I thought it took months long effort for a training run. How would they make changes in such a short period?
The companies aren’t changing anything. LLM outputs are just more random than people realize. Run the same prompt 10 times if you really want to know how well they can answer.
Same here, I find most of these skills/prompts a bit redundant. Some people argue that in including these in the conversation, one is doing latent space management of sorts and bringing the model closer to where one wants it to be.
I wonder what will happen with new LLMs that contain all of these in their training data.
On desktops and servers yeah. Bazzite was a bit of a special case as it was catered to handheld devices. So it did have that going for it. A one stop install that just supported everything on these devices from the start.
I've been thinking we could eliminate a lot of niche specialized distros by replacing them with system configs for Guix System or NixOS. Maybe if you got Ansible involved it could work for Debian and Arch also. Set your default packages, custom kernel, whatever else in there. Everything needing a big brand, name, logo, website, and so on seems a bit excessive at times.
Bazzite is sortof in that category, though. Fedora atomic is a podman container image, and Bazzite is using that as FROM in their Containerfile. It's niche and specialized only to the extent that they're providing gaming specific setup (like Nvidia drivers). It's mostly a Fedora system.
Now it’s your responsibility to explain what any of these words mean to an average user who just wants to play their Steam games. Like it or not, brands have power. It’s been hard enough to convince people already willing to try Linux gaming to use one of the dedicated gaming distros, instead of waiting for when SteamOS is going to support their hardware.
Using Fedora Kinoite/Silverblue is not really an option if you are using an Nvidia GPU. With Bazzite, the driver is pre-installed and also signed directly with a Secure Boot Key that you can import when installing Bazzite. With normal Fedora Atomic, you have to install and sign the driver manually, and with some updates, the whole thing breaks again, so you have to fiddle around with it.
In addition, Fedora Flatpak Remote has been removed, which is a “noobtrap” in normal Fedora Atomic. This allows you to install broken versions of browsers where the codecs are missing and videos don't work. In addition, Distrobox functions better than Toolbox, and in general, Bazzite's defaults are much more geared towards an immutable system. Silberblue/Kinoite's defaults are just like normal Fedora, and you have to layer dozens of things to achieve the same thing, whereas Bazzite is completely designed for a container workflow.
Even if you ignore any gaming optimizations, etc., this alone makes it a significantly better option than the official Fedora Atomic images.
Both, with caveats. The attention computation is fundamentally quadratic: for every token in the sequence, you're doing a computation that has to compute over every other token in the sequence. So it's O(N) per token, O(N^2) for the whole sequence.
The big mitigation for this is that in causal transformers (i.e. all the chatbot type applications, where each token is only allowed to see tokens before it), you're running inference repeatedly on the same prefix in order to grow it by one token at a time. So if you cache the computations for tokens 0..N-1, on each inference pass you only have to compute O(N) for the newly added token at the end of the sequence.
That's why caching (and caching charges) appear so prominently everywhere in the pricing of inference.
In practice, caching is most beneficial at inference time, because you typically have relatively long conversations that start with the same cacheable prefix (the system prompt). At training time the same optimization can apply, but you're typically not pushing the same prefixes through the model repeatedly so you end up paying the quadratic cost more often.
The quadratic cost of attention is the fundamental compute bottleneck for transformer architectures, which is why there's research like this trying to find shortcuts in computing attention, as well as research into completely new primitives to replace attention (e.g. SSM, which is O(N) on a cold cache and O(1) on a warm cache).
Strictly speaking: no. The "forward pass" terminology does not imply that there exists a "reverse pass" that does the same kind of computation. Rather, it's describing two different kinds of computation, and the direction they occur in.
The forward pass is propagating from inputs to outputs, computing the thing the model was trained for. The reverse/backwards pass is propagating from outputs back to inputs, but it's calculating the gradients of parameters for training (rougly: how much changing each parameter in isolation affects the output, and whether it makes the output closer to the desired training output). The result of the "reverse pass" isn't a set of inputs, but a set of annotations on the model's parameters that guide their adjustment.
The computations of the forward pass are not trivially reversible (e.g. they include additions, which destroys information about the operand values). As a sibling thread points out, you can still probabilistically explore what inputs _could_ produce a given output, and get some information back that way, but it's a lossy process.
And of course, you could train a "reverse" model, one that predicts the prefix of a sequence given a suffix (trivially: it's the same suffix prediction problem, but you train it on reversed sequences). But that would be a separate model trained from scratch on that task, and in that model the prefix prediction would be its forward pass.
I do want to see ChatGPT running upwards on my screen now, predicting earlier and earlier words in a futile attempt to explain a nonsense conclusion. We could call it ChatJeopardy.
Not as trivially as the forwards direction, unsurprisingly information is lost, but better than you might expect. See for example https://arxiv.org/pdf/2405.15012
In the end this and all other 89372304 AI projects are just OpenAPI/Anthropic API wrappers, but at least one has 1st party support which maybe gives it a slight advantage?
I was also thinking this some days ago. The scaffolding that static languages provide is a good fit for LLMs in general.
Interestingly, since we are talking about Go specifically, I never found that I was spending too much typing... types. Obviously more than with a Python script, but never at a level where I would consider it a problem. And now with newer Python projects using type annotations, the difference got smaller.
reply