More

profsummergig · 2026-01-14T17:46:10 1768412770

To my naive brain, the rules seem to be:

- it's okay when Side A goes after Assange (a journalist) for possessing classified material. Also, Side A encourages journalists in certain countries to do exactly what Assange did.

- it's not okay when Side B goes after journalists aligned with Side A

profsummergig · 2026-01-14T14:59:48 1768402788

Elaborate, please? (The excerpt that ended up in the book.)

RyJones · 2026-01-14T15:21:39 1768404099

here you go.

https://blog.ryjones.org/adams

profsummergig · 2026-01-14T17:49:08 1768412948

That email he sent you... boy, the last paragraph is such a gem of persuasion.

profsummergig · 2026-01-14T10:51:42 1768387902

The "people acknowledge my existence, people hold the door for me" is not about them being idiots. It's Scott arguing that women have it easy compared to men (which may or may not be true, feminists will disagree).

cloudfudge · 2026-01-14T14:44:13 1768401853

"Men are nice to you because of sex," is not an argument that women have it easy compared to men. It's another joke about men being idiots.

jpadkins · 2026-01-14T15:52:04 1768405924

[flagged]

dennis_jeeves2 · 2026-01-14T18:25:27 1768415127

>I think there is some underlying personality trait that might cause both aspects.

Low intelligence?

cloudfudge · 2026-01-14T17:41:40 1768412500

I suspect this tendency is not correlated to political leaning in any way, and the suggestion that it is says more about how you want to perceive people of a particular leaning than anything about them.

jpadkins · 2026-01-16T00:00:03 1768521603

There is a lot of research that personality traits on political leanings are correlated. So maybe that same common cause also affects humor?

https://pmc.ncbi.nlm.nih.gov/articles/PMC3809096/#:~:text=Th...

profsummergig · 2026-01-14T10:45:31 1768387531

You write beautifully. I decided to click on your other comments and found the same. Rare combination of high-density, high-impact vocabulary, and yet high-clarity.

profsummergig · 2026-01-14T10:09:33 1768385373

If you could share any links to DNRC related content, I'd love to see it. Can't find anything online, just broken links.

profsummergig · 2026-01-12T04:58:39 1768193919

Maybe they'll finally find the nuclear device lost on Nanda Devi, that has the potential to - *checks notes* - poison North India (via the glacier that feeds the Ganges).

ninjin · 2026-01-12T05:31:25 1768195885

https://en.wikipedia.org/wiki/Nanda_Devi_Plutonium_Mission

Guestmodinfo · 2026-01-12T08:25:22 1768206322

What's your opinion on a sudden flooding that happened some years ago in that region. I am an Indian so for some days our news were showing only that flooding news. It was sudden and super mssive and some news people suspected that same device or maybe one of the devices being accidentally going off. It was all speculation but the sudden and massive flooding was also unexplained to some extent. There has been several massive flooding in the region recently but all are due to extensive rain and cloud bursts. But one was unexplained in my untrained opinion. I remember it was some huge construction site. Wha they were building now I have forgotten that

ceejayoz · 2026-01-12T11:44:54 1768218294

The entire world would notice a nuke going off. Like we did with North Korea’s nuclear tests.

krasin · 2026-01-12T05:43:44 1768196624

> that (checks notes) has the potential to poison most of North India.

How large is the amount of plutonium in there? I highly doubt that it has the claimed potential.

krasin · 2026-01-12T06:00:42 1768197642

I found the specs for the fuel source: https://upload.wikimedia.org/wikipedia/en/e/e9/SNAP-19C_Moun...

The high-power unit had 300 grams of Pu-238 in 1965. Given its 87.7 years half-life, only 187g of Pu-238 remaining. It's very hard to do much damage with this amount of radioactive material.

onion2k · 2026-01-12T06:25:52 1768199152

It decays to uranium-234 though, which still isn't exactly nice. It'll be a long time before it's a block of inert lead.

krasin · 2026-01-12T07:09:47 1768201787

U-234 is ~3000x less radioactive than Pu-238, so having ~120g of U-234 is negligible.

I really fail to see a problem with these tiny amounts of non-brittle material embedded into a solid case. It's still very dangerous, but it's locally dangerous (meters away), not at the scale of whole countries.

khuey · 2026-01-12T06:06:35 1768197995

Around three pounds, and something like 40% of it has already decayed away since this happened in the 60s.

hahahahhaah · 2026-01-12T09:32:58 1768210378

What notes did you check?

profsummergig · 2026-01-02T00:24:32 1767313472

This is fascinating, because from what I've heard, Warren Buffett did not favor tech stocks. Does anyone know what gave Buffett the faith that this company was a real deal?

PlanksVariable · 2026-01-02T04:51:01 1767329461

It's a car and battery company, isn't it? Framing it as a tech company is a bit weird, but I guess Tesla did the same.

Buffett didn't love automobile stocks either, but Berkshire Hathaway held General Motors from 2012 to 2023.

holmesch · 2026-01-02T01:13:40 1767316420

It was Charlie Munger who became enthusiastic about BYD after learning about it from investor Li Lu, leading him to convince Buffett to make Berkshire Hathaway's $230 million investment in 2008.

epolanski · 2026-01-02T08:40:44 1767343244

I think this was mostly a Munger pet investment, he had an extremely high opinion about the CEO and could see he was delivering on his goals one after another.

jimbokun · 2026-01-02T00:42:47 1767314567

Maybe he was more afraid of software dominated stocks?

BYD is at heart an automobile manufacturer and so maybe he felt more confident evaluating it using his normal tools.

drited · 2026-01-02T03:45:55 1767325555

Yes the circumstances are well known. Li Liu convinced Munger and Munger talked to Buffett.

csomar · 2026-01-02T04:04:56 1767326696

Munger believed in the founder from the very early days before it was a car company.

dalyons · 2026-01-02T00:27:54 1767313674

They “just” were a battery company then. Is that considered tech?

rswail · 2026-01-02T12:44:46 1767357886

I remember vaguely an interview about it and he said he could understand the business and what was driving the industry and the technology.

I think he said something about equivalent of selling shovels to miners in a boom, that PV was going to need storage etc.

Ekaros · 2026-01-02T14:52:56 1767365576

Berkshire was never tech investor. They looked for solid manufacturing with good price and potential to scale like manufacturing. Not everything is tech and you can still grow without being tech.

rdtsc · 2026-01-02T13:45:29 1767361529

He owns the largest railroad company in US. That’s no less “tech” than batteries, motors and other EV car bits.

profsummergig · 2025-12-31T23:51:27 1767225087

Question:

Can a model's weights be hard-coded into a physical chip for cheap fast local AI?

ori_b · 2026-01-01T00:13:23 1767226403

Not cheap, unless that one specific model is going to be used across tens of millions of devices, with no updates, for the physical lifetime of the device.

profsummergig · 2026-01-02T00:58:55 1767315535

Ah, that makes sense. Economies of scale. Thanks.

profsummergig · 2025-12-22T19:28:14 1766431694

Haven't watched it yet...

...but, if you have favorite resources on understanding Q & K, please drop them in comments below...

(I've watched the Grant Sanderson/3blue1brown videos [including his excellent talk at TNG Big Tech Day '24], but Q & K still escape me).

Thank you in advance.

roadside_picnic · 2025-12-22T21:13:27 1766438007

It's just a re-invention of kernel smoothing. Cosma Shalizi has an excellent write up on this [0].

Once you recognize this it's a wonderful re-framing of what a transformer is doing under the hood: you're effectively learning a bunch of sophisticated kernels (though the FF part) and then applying kernel smoothing in different ways through the attention layers. It makes you realize that Transformers are philosophically much closer to things like Gaussian Processes (which are also just a bunch of kernel manipulation).

0. http://bactra.org/notebooks/nn-attention-and-transformers.ht...

leopd · 2025-12-22T19:38:13 1766432293

I think this video does a pretty good job explaining it, starting about 10:30 minutes in: https://www.youtube.com/watch?v=S27pHKBEp30

oofbey · 2025-12-22T19:41:09 1766432469

As the first comment says "This aged like fine wine". Six years old, but the fundamentals haven't changed.

andoando · 2025-12-22T20:19:24 1766434764

This wasn't any better than other explanation I've seen.

red2awn · 2025-12-22T19:34:51 1766432091

Implement transformers yourself (ie in Numpy). You'll never truly understand it by just watching videos.

D-Machine · 2025-12-22T19:52:29 1766433149

Seconding this, the terms "Query" and "Value" are largely arbitrary and meaningless in practice, look at how to implement this in PyTorch and you'll see these are just weight matrices that implement a projection of sorts, and self-attention is always just self_attention(x, x, x) or self_attention(x, x, y) in some cases, where x and y are are outputs from previous layers.

Plus with different forms of attention, e.g. merged attention, and the research into why / how attention mechanisms might actually be working, the whole "they are motivated by key-value stores" thing starts to look really bogus. Really it is that the attention layer allows for modeling correlations and/or multiplicative interactions among a dimension-reduced representation.

tayo42 · 2025-12-23T02:45:42 1766457942

>the terms "Query" and "Value" are largely arbitrary and meaningless in practice

This is the most confusing thing about it imo. Those words all mean something but they're just more matrix multiplications. Nothing was being searched for.

D-Machine · 2025-12-23T03:19:51 1766459991

Better resources will note the terms are just historical and not really relevant anymore, and just remain a naming convention for self-attention formulas. IMO it is harmful to learning and good pedagogy to say they are anything more than this, especially as we better understand the real thing they are doing is approximating feature-feature correlations / similarity matrices, or perhaps even more generally, just allow for multiplicative interactions (https://openreview.net/forum?id=rylnK6VtDH).

profsummergig · 2025-12-22T20:06:31 1766433991

Do you think the dimension reduction is necessary? Or is it just practical (due to current hardware scarcity)?

D-Machine · 2025-12-23T03:09:28 1766459368

Definitely mostly just a practical thing IMO, especially with modern attention variants (sparse attention, FlashAttention, linear attention, merged attention etc). Not sure it is even hardware scarcity per se / solely, it would just be really expensive in terms of both memory and FLOPs (and not clearly increase model capacity) to use larger matrices.

Also for the specific part where you, in code for encoder-decoder transformers, call the a(x, x, y) function instead of the usual a(x, x, x) attention call (what Alammar calls "encoder-decoder attention" in his diagram just before the "The Decoder Side"), you have different matrix sizes, so dimension reduction is needed to make the matrix multiplications work out nicely too.

But in general it is just a compute thing IMO.

roadside_picnic · 2025-12-22T21:22:39 1766438559

I personally don't think implementation is as enlightening as far as really understanding what the model is doing as this statement implies. I had done that many times, but it wasn't until reading about the relationship to kernel methods that it really clicked for me what is really happening under the hood.

Don't get me wrong, implementing attention is still great (and necessary), but even with something as simple as linear regression, implementing it doesn't really give you the entire conceptual model. I do think implementation helps to understand the engineering of these models, but it still requires reflection and study to start to understand conceptually why they are working and what they're really doing (I would, of course, argue I'm still learning about linear models in that regard!)

krat0sprakhar · 2025-12-22T20:20:02 1766434802

Do you have a tutorial that I can follow?

jwitthuhn · 2025-12-22T22:21:25 1766442085

If you have 20 hours to spare I highly recommend this youtube playlist from Andrej Karpathy https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxb...

It starts with the fundamentals of how backpropagation works then advances to building a few simple models and ends with building a GPT-2 clone. It won't taech you everything about AI models but it gives you a solid foundation for branching out.

roadside_picnic · 2025-12-22T21:27:05 1766438825

The most valuable tutorial will be translating from the paper itself. The more hand holding you have in the process, the less you'll be learning conceptually. The pure manipulation of matrices is rather boring and uninformative without some context.

I also think the implementation is more helpful for understanding the engineering work to run these models that getting a deeper mathematical understanding of what the model is doing.

throw310822 · 2025-12-22T20:35:13 1766435713

Have you tried asking e.g. Claude to explain it to you? None of the usual resources worked for me, until I had a discussion with Claude where I could ask questions about everything that I didn't get.

sakesun · 2025-12-22T23:55:09 1766447709

Perhaps we have already reached ASI. :)

throw310822 · 2025-12-23T07:42:23 1766475743

In some respects, yes. There is no single human being with a general knowledge as vast as that of a SOTA LLM, or able to speak as many languages. Claude knows about transformers more than enough to explain them to a layperson, elucidating specific points and resolving doubts. As someone who learns more easily by prodding other people's knowledge rather than from static explanations, I find LLMs extremely useful.

machinationu · 2025-12-22T21:51:23 1766440283

Q, K and V are a way of filtering the relevant aspects for the task at hand from the token embeddings.

"he was red" - maybe color, maybe angry, the "red" token embedding carries both, but only one aspect is relevant for some particular prompt.

https://ngrok.com/blog/prompt-caching/

oedemis · 2025-12-23T13:36:35 1766496995

there is also very good explanation from Luis Serrano, https://youtu.be/fkO9T027an0

bobbyschmidd · 2025-12-22T20:42:16 1766436136

tldr: recursively aggregating packing/unpacking 'if else if (functions)/statements' as keyword arguments that (call)/take them themselves as arguments, with their own position shifting according to the number "(weights)" of else if (functions)/statements needed to get all the other arguments into (one of) THE adequate orders. the order changes based on the language, input prompt and context.

if I understand it all correctly.

implemented it in html a while ago and might do it in htmx sometime soon.

transformers are just slutty dictionaries that Papa Roach and kage bunshin no jutsu right away again and again, spawning clones and variations based on requirements, which is why they tend to repeat themselves rather quickly and often. it's got almost nothing to do with languages themselves and requirements and weights amount to playbooks and DEFCON levels

profsummergig · 2025-12-19T23:52:13 1766188333

What breath work do you do every week please?

Reason why I'm asking: the book contains many techniques, and I'm curious about what's working best for people.

rramadass · 2025-12-20T05:21:48 1766208108

Not the person you asked the question of, but you may find this helpful - https://news.ycombinator.com/item?id=46333769

profsummergig · 2025-12-22T19:29:44 1766431784

Thank you.