Hacker Newsnew | past | comments | ask | show | jobs | submit | oceansky's commentslogin

It's not just illegal immigrants. They've been arresting and deporting legal ones and even citizens. Deporting to countries they never been to and foreign prisons where they cannot get out.

I would love to hear how is this similar to COVID at all.


"Crucially, it tells the agent not to rely on its internal training data (which might be hallucinated or refer to a different version of the game) but to ground its knowledge in what it observes. "

Does this even have any effect?


Yes, at least to some extent. The author mentions that the base model knows the answer to the switch puzzle but does not execute it properly here.

"It is worth noting that the instruction to "ignore internal knowledge" played a role here. In cases like the shutters puzzle, the model did seem to suppress its training data. I verified this by chatting with the model separately on AI Studio; when asked directly multiple times, it gave the correct solution significantly more often than not. This suggests that the system prompt can indeed mask pre-trained knowledge to facilitate genuine discovery."


My issue with this is that the LLM could just be roleplaying that it doesn't know.

Of course it is. It's not capable of actually forgetting or suppressing its training data. It's just double checking rather than assuming because of the prompt. Roleplaying is exactly what it's doing. At any point, it may stop doing that and spit out an answer solely based on training data.

It's a big part of why search overview summaries are so awful. Many times the answers are not grounded in the material.


It may actually have the opposite effect - the instruction to not use prior knowledge may have been what caused Gemini 3 to assume incorrect details about how certain puzzles worked and get itself stuck for hours. It knew the right answer (from some game walkthrough in its training data), but intentionally went in a different direction in order to pretend that it didn't know. So, paradoxically, the results of the test end up worse than if the model truly didn't know.

Doesn't know what? This isn't about the model forgetting the training data, of course it can't do that any more than I can say "press the red button. Actually, forget that, press whatever you want" and have you actually forget what I said.

Instead, what can happen is that, like a human, the model (hopefully) disregards the instruction, making it carry (close to) zero weight.


To test would just need to edit the rom and switch around the solution. Not sure how complicated that is, likely depends on the rom system.

I don't know why people still get wrapped around the axle of "training data".

Basically every benchmark worth it's salt uses bespoke problems purposely tuned to force the models to reason and generalize. It's the whole point of ARC-AGI tests.

Unsurprisingly Gemini 3 pro performs way better on ARC-AGI than 2.5 pro, and unsurprisingly it did much better in pokemon.

The benchmarks, by design, indicate you can mix up the switch puzzle pattern and it will still solve it.


I'm wondering about this too. Would be nice to see an ablation here, or at least see some analysis on the reasoning traces.

It definitely doesn't wipe its internal knowledge of Crystal clean (that's not how LLMs work). My guess is that it slightly encourages the model to explore more and second-guess it's likely very-strong Crystal game knowledge but that's about it.


The model probably recognizes the need for a grassroots effort to solve the problem, to "show it's work".

It will definitely have some effect. Why won't it? Even adding noise into prompts (like saying you will be rewarded $1000 for each correct answer) has some effect.

Whether the 'effect' something implied by the prompt, or even something we can understand, is a totally different question.


It's hard to say for sure because Gemini 3 was only tested with this prompt. But for Gemini 2.5, which is who the prompt was originally written for, yes this does cut down on bad assumptions (a specific example: the puzzle with Farfetch'd in Ilex Forest is completely different in the DS remake of the game, and models love to hallucinate elements from the remake's puzzle if you don't emphasize the need to distinguish hypothesis from things it actually observes).

I very much doubt it

It might get things wrong on purpose, but deep down it knows what it's doing

Do we have examples of this in promps in other contexts?

If they trained the model to respond to that, then it can respond to that, otherwise it can't necessarily.

I think you got a point here. These companies are injecting a lot of datasets every day into it.

What I meant is more like, if you write tests for something you know it works, and if you don't write tests you don't know that.

I would imagine that prompting anything like this will have an excessively ironic effect like convincing it to suppress patterns which it would consider to be pre-knowledge.

If you looked inside they would be spinning on something like "oh I know this is the tile to walk on, but I have to only rely on what I observe! I will do another task instead to satisfy my conditions and not reveal that I have pre-knowledge.

LLMs are literal douche genies. The less you say, generally, the better


Instead, there are people arguing for increased surveillance. Which even if implemented "correctly" would not prevent crime.

Might be used to track women seeking abortions though.


Law is very imprecise and subjetive, I really doubt that.

Rocking my 5800x from 2022, 32gb ram.

I have my 5800X in my AM4 motherboard from 2017. My current system as been beyond any doubts the best bang for my bucks of any computer I have built.

Same, 5800X in my X470 AORUS mobo and it's been fantastic, no desire to upgrade (already had the 64gb ram, so the CPU swap was simple, I think I got $50 from my old 2700 cpu)

5900X here with mobo from 2019. Will upgrade my GPU or get a mac if I'm able to setup my wacom pen the way I want. Either way I will keep the current machine

Hell yeah, buddy.

    $ cat /proc/cpuinfo
    ...
    model name      : AMD Ryzen 7 5800X 8-Core Processor

    $ free -m
    ...
                   total        used        free      shared  buff/cache   available
    Mem:           32006        6878        1088         363       24856       25127

3800x here, so disappointing little progress since then

Depends on your workload of course, but my upgrade from 3700X to 9950X3D gave a massive boost and I should have upgraded to 16-core a lot earlier.

Can you give specific examples on lost knowledge?

“why is I/O in docker slow, and how would you improve it” is pretty esoteric knowledge now, but would have been considered basic knowledge (for other applications, not specifically just docker) only 12 years ago.

I have had people working who don’t in the slightest understand how a filesystem works, so taking it a step further is impossible.

When I tune things I am asked how I know, but everything is just built from the basics, and the basics don’t make you feel productive, so they’re always skipped when possible.


12 years ago I certainly did not know why a servers IO would be slow, short of just the physical storage was slow. I think you might just be overestimating how much stuff people knew rather than the whole population forgetting how filesystem and IO internals work.

you hadn’t heard of RAID, readahead, write-back/write-through, stride or even just the concept of fragmentation?

Even if you didn’t, I doubt you didn’t have someone on staff who did know about these things and would help out randomly with troubleshooting and avoiding footguns.


The people who knew about those things back then know modern infrastructure today. I'm sure if you asked the average web dev 12 years ago what write-back io is they wouldn't have any idea.

Perhaps the only trend is more companies not hiring anyone who specialises in infrastructure and just leaving it as a side task for React devs to look at once every few months.



I knew about RAID and fragmentation, but I haven't had to work with it since I went from tech support to backend, it just never came up so it's easy to forget.

> “why is I/O in docker slow, and how would you improve it” is pretty esoteric knowledge now, but would have been considered basic knowledge (for other applications, not specifically just docker) only 12 years ago.

you could've used docker for 12 years and never hit it if you used it on Linux, and followed sensible practices (mount the data dir from outside so it can be reattached to upgraded version of the container)


> I have had people working who don’t in the slightest understand how a filesystem works, so taking it a step further is impossible.

It's as if computer science, in terms of data structures and algorithms, isn't taught. Or, perhaps, isn't taught as being relevant.

As for lack of knowledge about filesystems: it might be contributed by mobile devices hiding real filesystems from users.

> the basics don’t make you feel productive, so they’re always skipped when possible.

Basics do make me feel productive. However, it seems bosses and businesses don't agree.

I fear the day basics can be automated away.


and in fairness to the mobile devices thing of abstracting file systems, when it comes to discoverability and organizing files or documents, a rigid hierarchy of nested sub-folders is far inferior to a single directory with tagging or other metadata properties you can use to filter and essentially build custom directories on the fly with.

> “why is I/O in docker slow, and how would you improve it” is pretty esoteric knowledge now, but would have been considered basic knowledge only 12 years ago.

Yes and no. The world has also changed all these years. Why something is slow 10+ years ago might not be today or at least for the same reason. E.g. Docker on Mac especially with Apple silicon has undergone major changes the last few years.


Maybe in today, it has too many wrapper layer so the basic become deeper.

Keeping tech fast, if my worldview holds. One reason I left frontend work before was that none of my colleagues seemed to care that we shipped MBs of code to the client. I also tire of APIs that are in the multi-second response time arena, often because no one seems to bother with database indexes or JOIN optimisation. This should be banal, everyday stuff.

Maybe we have too many layers of abstraction. Or there's just too much work to do now that businesses combine many roles into one?


The write up of how Windows 11 24H2 broke GTA San Andreas was excellent.

https://cookieplmonster.github.io/2025/04/23/gta-san-andreas...


JSTOR settled with Swartz and did not pursue a civil lawsuit.


Current definition:

"Mathematics is a field of study that discovers and organizes methods, theories, and theorems that are developed and proved for the needs of empirical sciences and mathematics itself."

In order to understand mathematics you must first understand mathematics.


Only mathematics can define objects in a non recursive way. Human language can’t (Münchhausen Trilemma)


Not recommended for people prone to seizures


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: