> billion parameter count models Do you really mean to suggest that with models ...

CapsAdmin · on Feb 22, 2025

There's no 100% with ai, but yes, I'm implying that a mistake like this is not likely to happen with high parameter count models, and very likely to happen with low parameter count models.

BoorishBears · on Feb 22, 2025

This is bait to "remind" anyone who replies that LLMs are fundamentally unreliable technology and it doesn't matter how many floating point numbers are in the magic black box, you can't trust it because it's fundamentally flawed.

I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.

Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.

Even CPUs have entered regimes where fundamentals like "transistors turn off with zero signal to base" are not even true anymore.

Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system. That's a lot of compute for right now, but over time both the amount of compute you'd need, and the cost of that compute are going down.

dijksterhuis · on Feb 22, 2025

> This is bait to "remind" anyone who replies that LLMs are fundamentally unreliable technology and it doesn't matter how many floating point numbers are in the magic black box, you can't trust it because it's fundamentally flawed

sarkiness aside, yes, they are unreliable. fundamentally flawed? no. they are non-deterministic/non-heuristic systems based on probability. the problem is one of use case. for use cases that require integrity and trust in the information — i.e. the answer needs to be right — do not use a machine learning model by itself.

trouble is, lots of people are running off trying to make their quick money trying to make machine learning work for every use case. so when these people screw things up, it makes everyone question the integrity, reliability, etc of “AI” as a whole … guess what, people push back because maybe some things don’t need AI.

like email subjects. which have worked fine for over 20 years now.

> Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.

all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user. i’ve never in my life had any impact on my day due to a bit flip on one of my device’s bits of RAM.

loads of people just got very confused because yahoo mail jumped on the “ai” hype train with summaries which are objectively wrong. the impact is real and is not being mitigated. this isn’t just yahoo mail. apple news is no longer summarising bbc news because the summaries were not only wrong, but potentially dangerous/harmful. so it’s industry wide and systemic.

> Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system.

that’s quite a big claim with no real evidence. which gets back right to the heart of the issue of why people “remind” others so much about how unreliable these systems are. because there’s a whole contingent of people claiming things which are more akin to hopes, rather than anything actually based in evidence.

> I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.

i’m grateful i’m the kind of engineer that doesn’t buy into hype and actually thinks about whether tech should be used for specific use cases.

BoorishBears · on Feb 22, 2025

"they are non-deterministic/non-heuristic systems based on probability" is a fundamental flaw for a deterministic task.

You seem to be confusing "fundamental" with "terminal".

> all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user.

You missed their comment is talking about what could be implemented, not what is implemented.

> that’s quite a big claim with no real evidence.

A zero-shot single pass of summarization with a model like Gemini 2.0 Flash benchmarks at around 99.3% accuracy.

Gemini Flash slots in below their 120B Pro model and the 8B Flash model, and is generally estimated to be <70B parameters: https://x.com/ArtificialAnlys/status/1867292015181942970

So we could give Gemini multiple passes with multiple prompts, multiple shots of each prompt, time for thinking, self-verification, and have plenty of room left over.

I wrote the comment in a way that assumes that the reader understands just how much compute 1 Trillion parameters represents for current models.

That's an insane amount of compute for each email, so it's not a big claim at all.

But I tend to write assuming people actually know the topics they're commenting on, so please forgive the lack of evidence.