More

reitzensteinm · 2026-01-26T23:39:50 1769470790

But coding is largely trained on synthetic data.

For example, Claude can fluently generate Bevy code as of the training cutoff date, and there's no way there's enough training data on the web to explain this. There's an agent somewhere in a compile test loop generating Bevy examples.

A custom LLM language could have fine grained fuzzing, mocking, concurrent calling, memoization and other features that allow LLMs to generate and debug synthetic code more effectively.

If that works, there's a pathway to a novel language having higher quality training data than even Python.

mbreese · 2026-01-27T04:42:39 1769488959

I recently had Codex convert an script of mine from bash to a custom, Make inspired language for HPC work (think nextflow, but an actual language). The bash script submitted a bunch of jobs based on some inputs. I wanted this converted to use my pipeline language instead.

I wrote this custom language. It's on Github, but the example code that would have been available would be very limited.

I gave it two inputs -- the original bash script and an example of my pipeline language (unrelated jobs).

The code it gave me was syntactically correct, and was really close to the final version. I didn't have to edit very much to get the code exactly where I wanted it.

This is to say -- if a novel language is somewhat similar to an existing syntax, the LLM will be surprisingly good at writing it.

reitzensteinm · 2026-01-18T00:43:03 1768696983

Yet 2025 power sector emissions fell, which is quite discordant with the picture you're attempting to paint.

https://ember-energy.org/data/electricity-data-explorer/?dat...

reitzensteinm · 2025-12-19T21:10:02 1766178602

Hill climbing a password would only be possible if intermediate KV cache entries were stored. To hillclimb "hunter2", you're going to try "a", "b", "c", etc, until you notice that "h" comes back faster. Then you try "ha", "hb" and so on.

But that's only going to work if the cache looks like: "h", "hu", "hun", ..., "hunter2"

If just "hunter2" is in the cache, you won't get any signal until you stumble on exactly that password. And that's before getting into the block size granularity of the caches discussed elsewhere in this thread.

That's not to say timing attacks aren't possible. I haven't looked at Claude Code's prompt generation, but there's no intrinsic reason why you couldn't do things like figure out what open source code and research papers your competitors are loading into context.

Sharing caches between orgs would be an incredible misstep.

jgeralnik · 2025-12-19T22:15:18 1766182518

Right, you can’t actually guess a letter (byte) at a time but you can guess a token at a time (I believe the vocabulary is 200000 possible tokens in gpt 5) So you could send each of the 200000 possible tokens, see which is cached, and then send 200000 more tokens to find the next cached token Certainly less efficient but well within the realm of a feasible attack

reitzensteinm · 2025-12-19T23:28:13 1766186893

It's a good call out re: tokens vs letters, but I think you might have misunderstood my point - you can't do it a token at a time unless the intermediate KV cache is stored after each token is generated.

This won't be the case in any non toy implementation, as it would be unneccessary and slow.

jgeralnik · 2025-12-20T05:51:16 1766209876

Ah, fair enough. Anthropic caches at a block level (basically a single message) so for non-trivial messages this is really less of a concern, although I definitely understand why they still scope cache to a single tenant

reitzensteinm · 2025-11-02T14:56:26 1762095386

I'm a little nervous about the correctness of the memory orderings in this project, e.g.

Two acquires back to back are unnecessary here. In general, fetch_sub and fetch_add should give enough guarantees for this file in Relaxed. https://github.com/frostyplanet/crossfire-rs/blob/master/src...

Congest is never written to with release, so the Acquire is never used to form a release chain: https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The queue appears to close the channel twice (once per rx/tx), which is discordant with the apparent care taken with the fencing. https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The author also suggests an incorrect optimization to Tokio here which suggests a lack of understanding of what the specific guarantees given are: https://github.com/tokio-rs/tokio/pull/7622

The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

This stuff is hard. I almost certainly made a mistake in what I've written above (edit: I did!). In practice, the queue is probably fine to use, but I wouldn't be shocked if there's a heisenbug lurking in this codebase that manifests something like: it all works fine now, but in the next LLVM version an optimization pass is added which breaks it on ARM in release mode, and after that the queue yields duplicate values in a busy loop every few million reads which is only triggered on Graviton processors.

Or something. Like I said, this stuff is hard. I wrote a very detailed simulator for the Rust/C++ memory model, have implemented dozens of lockless algorithms, and I still make a mistake every time I go to write code. You need to simulate it with something like Loom to have any hope of a robust implementation.

For anyone interested in learning about Rust's memory model, I can't recommend enough Rust Atomics and Locks:

https://marabos.nl/atomics/

embedding-shape · 2025-11-02T18:12:59 1762107179

> The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

Loom is apparently this: https://github.com/tokio-rs/loom I've used tokio a bit in the past, but wasn't aware of that tool at all, looks really useful and probably I'm not alone in never hearing about it before. Any tips&tricks or gotchas with it one should know beforehand?

reitzensteinm · 2025-10-15T07:24:27 1760513067

I'm not going to thumb my nose at CPU design content from folks that aren't good at public speaking. They're almost entirely distinct skill sets.

actionfromafar · 2025-10-15T08:53:43 1760518423

Also, the Venn Diagram between (good public speech) and (good public speech which also looks good when transcribed) is probably pretty thin.

reitzensteinm · 2025-09-28T02:37:57 1759027077

It's too early to tell.

reitzensteinm · 2025-09-09T02:51:50 1757386310

This comment is a nugget of gold - I hadn't thought about it in those terms before but it makes total sense. Thank you!

reitzensteinm · 2025-08-12T22:52:07 1755039127

One of my favourite YouTubers, Matt Orchard, did a video on cults that included Heaven's Gate. He interviews a surviving member. If this article interested you, it's worth a watch (the beginning is a bit silly):

https://www.youtube.com/watch?v=L9F-vb7s3DE

reitzensteinm · 2025-08-10T23:10:29 1754867429

OpenAI automatically caches prompt prefixes on the API. Caching an infrequently changing internally controlled system prompt is trivial by comparison.

reitzensteinm · 2025-08-05T08:12:56 1754381576

See Tokio's Loom as an example: https://github.com/tokio-rs/loom

In development, you import Loom's mutex. In production, you import a regular mutex. This of course has zero overhead, but the simulation testing itself is usually quite slow. Only one thread can execute at a time, and many iterations are required.