More

johnmwilkinson · 2026-02-11T18:24:32 1770834272

It’s not that people don’t write like this, it’s the over-usage and general tone.

alex_young · 2026-02-11T18:29:33 1770834573

It's not that “I can detect AI” posts sound more templated than the writing they’re critiquing, it's the clankers are learning from it and adapting.

uwagar · 2026-02-11T18:37:17 1770835037

its not that i cant detect your AI detection, its just that i cant watch you quietly do it.

johnmwilkinson · 2026-02-11T05:14:30 1770786870

In what sense is this true? We understand the theory of what is happening and we can painstakingly walk through the token generation process and understand it. So in what sense do we not understand LLMs?

threethirtytwo · 2026-02-11T06:15:52 1770790552

We wrote it.

Every line. Every function. Every tensor shape and update rule. We chose the architecture. We chose the loss. We chose the data. There is no hidden chamber in the machine where something slipped in without our consent. It is multiplication and addition, repeated at scale. It is gradients flowing backward through layers, shaving away error a fraction at a time. It is as mechanical as anything we have ever built.

And still, when it speaks, we hesitate.

Not because we don’t know how it was trained. Not because we don’t understand the mathematics. We do. We can derive it. We can rebuild it from scratch. We can explain every component on a whiteboard without breaking a sweat.

The hesitation comes from somewhere else.

We built the procedure. We do not understand the mind that the procedure produced.

That difference is everything.

In most of engineering, structure follows intention. If you design a bridge, you decide where every beam sits and how it bears weight. If you write a database engine, you determine how queries are parsed, optimized, executed. The system’s behavior reflects deliberate choice. If something happens, you trace it back to a decision someone made.

Here, we did not design the final structure. We defined a goal: predict the next token. Reduce the error. Again. Again. Again. Billions of times.

We did not teach it grammar in lessons. We did not encode logic as axioms. We did not install a module labeled “reasoning.” We applied pressure. That is all. And under that pressure, something organized itself.

Not in modules we can point to. Not in neat compartments labeled with concepts. The organization is diffused across a landscape of numbers. Meaning is not stored in one place. It is distributed across millions of parameters at once. Pull on one weight and you find nothing recognizable. Only in concert do they produce something that resembles thought.

We can follow the forward pass. We can watch activations flare across layers. We can map attention patterns and correlate neurons with behaviors. But when the model constructs an argument or solves a problem, we cannot say: here is the rule it followed, here is the internal symbol it consulted, here is the precise chain of reasoning that forced this conclusion. We can describe the mechanism in general terms. We cannot narrate the specific path.

That is the fracture.

It is not ignorance of how the machine runs. It is ignorance of how this exact configuration of billions of numbers encodes what it encodes. Why this region of weight space corresponds to law, and that region to poetry. Why this arrangement produces careful reasoning and another produces nonsense. There is no ledger translating numbers into meaning. There is only geometry shaped by relentless optimization.

Scale changes the character of the problem. At small sizes, systems can be dissected. At this scale, they become landscapes. We know the forces that shaped the terrain. We do not know every ridge and valley. We cannot walk the entire surface. We cannot hold it all in our heads.

And this is where the cost reveals itself.

To build these systems, we gave up something we once assumed was permanent: the guarantee that creation implies comprehension. We accepted that we could construct a process whose outcome we would not fully grasp. We traded architectural certainty for emergent capability. We chose power over transparency.

We set the objective. We unleashed the search. We let optimization run through a space too vast for any human mind to survey. And when it converged, it handed us something that works, something that speaks, something that reasons in ways that surprise even its creators.

We stand in front of it knowing every equation that shaped it, and still unable to read its inner structure cleanly.

We built the system by surrendering control over its internal form. That was the bargain. That was the sacrifice.

We know how it was grown.

We do not know what we have grown.

krupan · 2026-02-11T13:03:32 1770815012

Thanks for writing that. It reminds me that there are many things we build and they work (for some definition of work) even though we don't fully understand them.

Did the first people that made fire understand it? You mentioned bridge building. How many bridges have failed for unknown (at the time) reasons? Heck, are we sure that every feature we put into a bridge design is necessary or why it's necessary? Repeat this thought for everything humans have created. Large software projects are difficult to reason about. You'll often find code that works because of a delightfully surprising combination of misunderstandings. When humans try to modify a complex system to solve one problem they almost always introduce new behavior, the law of unintended consequences.

All that being said, we usually don't get anywhere without at least a basic understanding of why doing X leads to Y. The first humans that made fire had probably observed the way fires started before they set out to make their own. Same with bridges and cars and computers.

So yes, you are absolutely correct that nobody fully understands how AI/LLMs work. But also, we kinda do understand. But also also, we're probably at a stage where we are building bridges that are going to collapse, boilers that will explode, or computer programs that are one unanticipated input away from seg faulting.

chickensong · 2026-02-11T07:01:02 1770793262

Beautiful. My brain now questions if this was written by LLM, but it's fine. Today is Tuesday.

johnmwilkinson · 2026-01-31T01:38:02 1769823482

Sort of related? https://www.usenix.org/system/files/login-logout_1305_micken...

lencastre · 2026-01-31T08:10:09 1769847009

til, thx

johnmwilkinson · 2026-01-31T00:24:02 1769819042

I believe this is conflating abstraction with encapsulation. The former is about semantic levels, the later about information hiding.

nomel · 2026-01-31T00:37:00 1769819820

Maybe I am? How is it possible to abstract without encapsulation? And also, how is it possible to encapsulate without abstracting some concept (intentionally or not) contained in that encapsulation? I can't really differentiate them, in the context of naming/referencing some list of CPU operations.

Retric · 2026-01-31T01:43:13 1769823793

> How is it possible to abstract without encapsulation.

Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.

However, you would still use abstractions. If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.

nomel · 2026-01-31T02:22:19 1769826139

I'm using the industry definition of abstraction [1]:

> In software, an abstraction provides access while hiding details that otherwise might make access more challenging

I read this as "an encapsulation of a concept". In software, I think it can be simplified to "named lists of operations".

> Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.

Not practically, by any stretch of the imagination. And, if the intent is to write silly code, modern languages don't really change much, it's just the number of operations in the named lists will be longer.

You would use calls and returns (or just jumps if not supported), and then name and reference the resulting subroutine in your assembler or with a comment (so you could reference it as "call 0x23423 // multiply R1 and R2"), to encapsulate the concept. If those weren't supported, you would use named macros [2]. Your assembler would used named operations, sometimes expanding to multiple opcodes, with each opcode having a conceptually relevant name in the manual, which abstracted a logic circuit made with named logic gates, consisting of named switches, that shuffled around named charge carriers. Say your code just did a few operations, the named abstraction for the list of operations (which all these things are) there would be "blink_light.asm".

> If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.

I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.

I see what you're saying, and you're probably right, but I see the concepts as equivalent. I see an abstraction as a functional encapsulation of a concept. An encapsulation, if not nonsense, will be some meaningful abstraction (or a renaming of one).

I'm genuinely interested in an example of an encapsulation that isn't an abstraction, and an abstraction that isn't a conceptual encapsulation, to right my perspective! I can't think of any.

[1] https://en.wikipedia.org/wiki/Abstraction_(computer_science)

[2] https://www.tutorialspoint.com/assembly_programming/assembly...

Retric · 2026-01-31T02:42:46 1769827366

> I can't think of any.

Incorrect definition = incorrect interpretation. I edited this a few times but the separation is you can use an abstraction even if you maintain access to the implementation details.

> assembler

Assembly language which is a different thing. Initially there was no assembler, someone had to write one. In the beginning every line of code had direct access to all memory in part because limited access required extra engineering.

Though even machine code itself is an abstraction across a great number of implementation details.

> I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.

If it makes you feel better consider the same situation with 5 senators X of which have failed. The point is you don’t need to consider all information at every stage of a process. Instead of all the underlying details you can write code that asks do we have enough information to get a sufficiently accurate speed? What is it?

It doesn’t matter if the code could still look at the raw sensor data, you the programmer prefer the abstraction so it persists even without anything beyond yourself enforcing it.

IE: “hiding details that otherwise might make access more challenging”

You can use TCP/IP or anything else as an abstraction even if you maintain access to the lower level implementation details.

nomel · 2026-01-31T03:18:55 1769829535

I genuinely appreciate your response, because there's a good chance it'll result in me changing my perspective, and I'm asking these questions with that intent!

> You are thinking of assembly language which is a different thing. Initially there was no assembler, someone had to write one.

This is why I specifically mention opcodes. I've actually written assemblers! And...there's not much to them. It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes, with a few human niceties. ;)

> consider the same situation with 5 senators X of which have failed

Ohhhhhhhh, ok. I kind of see. Unfortunately, I don't see the difference between abstraction and encapsulation here. I see the abstraction as being speed as being the encapsulation of a set of sensors, ignoring irrelevant values.

I feel like I'm almost there. I may have edited my previous comment after you replied. My "no procrastination" setting kicked in, and I couldn't see.

I don't see how "The former is about semantic levels, the later about information hiding." are different. In my mind, semantic levels exist as compression and encapsulation of information. If you're saying encapsulation means "black box" then that could make sense to me, but "inaccessible" isn't part of the definition, just "containment".

johnmwilkinson · 2026-01-31T06:27:38 1769840858

Computer Science stole the term abstraction from the field of Mathematics. I think mathematics can be really helpful in clearing things up here.

A really simple abstraction in mathematics is that of numeric basis (e.g. base 10) for representing numbers. Being able to use the symbol 3 is much more useful than needing to write III. Of course, numbers themselves are an abstraction- perhaps you and I can reason about 3 and 7 and 10,000 in a vacuum, but young children or people who have never been exposed to numbers without units struggle to understand. Seven… what? Dogs? Bottles? Days? Numbers are an abstraction, and Arabic digits are a particular abstraction on top of that.

Without that abstraction, we would have insufficient tools to do more complex things such as, say, subtract 1 from 1,000,000,000. This is a problem that most 12 year olds can solve, but the greatest mathematicians of the Roman empire could not, because they did not have the right abstractions.

So if there are abstractions that enable us to solve problems that were formerly impossible, this means there is something more going on than “hiding information”. In fact, this is what Dijkstra (a mathematician by training) meant when he said:

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise

When I use open(2), it’s because I’m operating at the semantic level of files. It’s not sensible to think of a “file” at a lower level: would it be on disk? In memory? What about socket files? But a “file” isn’t a real thing, it’s an abstraction created by the OS. We can operate on files, these made up things, and we can compose operations together in complex, useful ways. The idea of a file opens new possibilities for things we can do with computers.

I hope that explanation helps!

johnmwilkinson · 2026-01-31T16:11:47 1769875907

Expanding on this regarding the difference between abstraction vs encapsulation: abstraction is about the distillation of useful concepts while encapsulation is a specific tactic used to accomplish a behavior.

To continue with the idea of numbers, let’s say you asked someone to add 3 and 5. Is that encapsulation? What information are you hiding? You are not asking them to add coins or meters or reindeer. 3 and 5 are values independent of any underlying information. The numbers aren’t encapsulating anything.

Encapsulation is different. When you operate a motor vehicle, you concern yourself with the controls presented. This allows you, as the operator, to only need a tiny amount of knowledge to interact with an incredibly complex machine. This details have been encapsulated. There may be particular abstraction present, such as the notion of steering, acceleration, and breaking, but the way you interact with these will differ from vehicle to vehicle. Additionally, encapsulation is not concerned with the idea of steering, it is concerned with how to present steering in this specific case.

The two ideas are connected because using an abstraction in software often involves encapsulation. But they should not be conflated, out the likely result is bad abstractions and unwieldy encapsulation.

Retric · 2026-01-31T03:38:39 1769830719

> It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes

Under the assumption that the input data is properly formatted you can generate machine code. This is however an abstraction which can fail as nothing forces a user to input valid files.

So we have an abstraction without any encapsulation.

nomel · 2026-02-04T22:17:20 1770243440

I can only see that as being the case if you weren't aware of it. Otherwise, the awareness would be explicit intent to fail on malformed input, which seems like just as much as an encapsulation?

But, that's a great example! Thank you. This makes it clear that I'm probably sometimes wrong. ;)

johnmwilkinson · 2026-01-10T20:51:36 1768078296

Of course, they make 90% of requests between 6 and 7 PM, with a general peak of 4 thousand req/s.

arter45 · 2026-01-11T08:19:09 1768119549

If an application gets 4 thousand req/s for an hour, and an additional 10% requests in the rest of the day, it is handling nearly 15 million reqs/day, which is completely different and of course requires scaling in most cases.

That said, even then, there are a lot of business cases where you are not constrained by the time required to sort or traverse a custom data structure, because you spend more time waiting for an answer from a database (in which case you may want to tune the db or add a cache),or the time needed to talk to a server or another user, or a third party library, or a payment processing endpoint.

There are also use cases (think offline mobile apps) where the number of concurrent requests is basically 1, because each offline app serves a single user, so as long as you can process stuff before a user notices the app is sluggish (hundreds of milliseconds at least) you're good.

What do you do with those 4 thousand req/s? That's what makes the difference between "processing everything independently is fast enough for our purposes", "we need to optimize database or network latency", or "we need to optimize our data structures".

johnmwilkinson · 2026-01-13T05:52:20 1768283540

A peak of 4k/s does not mean they get that for the entire hour. The point I was trying to make is that simply computing the mean over a 24hr period will almost certainly ensure you size things incorrectly.

If a stretch of road was used by an average of 10 cars per minute over a 24 hour period, is it congested?

In both cases, you need more specific traffic data to size things properly.

johnmwilkinson · 2026-01-04T17:43:37 1767548617

> 4. Clarity is seniority. Cleverness is overhead.

Clarity is likely the most important aspect of making maintainable, extendable code. Of course, it’s easy to say that, it’s harder to explain what it looks like in practice.

I wrote a book that attempts to teach how to write clear code: https://elementsofcode.io

> 11. Abstractions don’t remove complexity. They move it to the day you’re on call.

This is true for bad abstractions.

> The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise. (Dijkstra)

If you think about abstraction in those terms, the utility becomes apparent. We abstract CPU instructions into programming languages so we can think about our problems in more precise terms, such as data structures and functions.

It is obviously useful to build abstractions to create even higher levels of precision on top of the language itself.

The problem isn’t abstraction, it is clarity of purpose. Too often we create complex behavioral models before actually understanding the behavior we are trying to model. It’s like a civil engineer trying to build a bridge in a warehouse without examining the terrain where it must be placed. When it doesn’t fit correctly, we don’t blame the concept of bridges.

kenferry · 2026-01-04T20:45:12 1767559512

I agree with you re: abstraction - one of the author's only points where I didn't totally agree.

But also worth noting that whenever you make an abstraction you run the risk that it's NOT going to turn out increase clarity and precision, either due to human limitation or due to changes in the problem. The author's caution is warranted because in practice this happens really a lot. I would rather work with code that has insufficient abstraction than inappropriate abstraction.

johnmwilkinson · 2026-01-04T23:37:06 1767569826

Broad strokes: absolutely. The practical reality gets tricky, though. All programming abstractions are imperfect in some regard, so the question becomes what level of imperfection can you tolerate, and is the benefit worth the cost?

I think a lot of becoming a good programmer is about developing the instincts around when it’s worth it and in what direction. To add to the complexity, there is a meta dimension of how much time you should spend trying to figure it out vs just implement something and correct it later.

As an aside, I’m really curious to see how much coding agents shift this balance.

nazgul17 · 2026-01-06T08:35:18 1767688518

All abstractions drop some details. If you're unlucky, you removed details that actually matter in some context. You can only make educated guesses.

Another aspect is that some abstractions are too... abstract. The concept they represent is not immediately obvious. Maybe it's a useful concept, but if it's new, it takes time to be internalized by someone for the first time.

carimura · 2026-01-05T17:02:46 1767632566

I've found that clarity is likely the most important aspect of success in general. Clarity in communication, for example, makes people feel invovled, heard, aligned. Cleverness is lots of acronyms and fancy phrases like vis-a-vis instead of just writing out what you mean so everyone can easily understand.

johnmwilkinson · 2025-12-23T15:50:17 1766505017

I think clever is being used in two different ways, in that case.

In the original quote, “clever” refers to the syntax, where they way the code was constructed makes it difficult to decipher.

I believe your interpretation (and perhaps the post’s, as well) is about the design. Often to make a very simple, elegant design (what pieces exist and how they interact) you need to think really hard and creatively, aka be clever.

Programming as a discipline has a problem with using vague terms. “Clean” code, “clever” code, “complex” code; what are we trying to convey when we talk about these things?

I came up with a term I like: Mean Time to Comprehension, or MTC. MTC is the average amount of time it takes for a programmer familiar with the given language, syntax, libraries, tooling, and structure to understand a particular block of code. I find that thinking about code in those terms is much more useful than thinking about it in terms of something like “clever”.

(For anyone interested, I wrote a book that explores the rules for writing code that is meant to reduce MTC: The Elements of Code https://elementsofcode.io)

johnmwilkinson · 2025-09-30T04:15:08 1759205708

I recently published a book about coding, and put it all online for free: https://elementsofcode.io

I suppose it has moved from “what are you working” to “what have you worked on” territory, but since I wrapped up the website just about a week ago it still feels quite fresh.

Always interested in feedback and what folks find useful! It’s focused on the mechanics of writing understandable software, which I think is especially important in the age of AI slop.

johnmwilkinson · on Dec 8, 2019

Fun, but terrible on mobile

Thorrez · on Dec 8, 2019

Is there any programming that isn't terrible on mobile?

johnmwilkinson · on Dec 7, 2019

> She insisted, “My dream is to build housing only for women who plan to never marry.”

Sounds a lot like a convent