Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Prompt engineering stories that keep Eliezer Yudkowsky up at night.

It's especially funny when the LLM invents stuff like, "I'll bioengineer a virus that kills all the humans."

Like, with what tools and materials? Can it explain how it intends to get access to primers, a PCR machine, or even test that any of its hypotheses work? Is it going to check in on its cell cultures every day for a year? How's it going to passage the cell media, keep it free of mold and bacteria and toxins? Is it going to sign for its UPS deliveries?

Hand waving all around.

These flights of fancy are kind of like the "Gell-Mann amnesia effect" [1], except that it's people that convince themselves they understand complex systems in other people's fields in a comedically cartoon way. That self-assembling super intelligence will just snap its fingers, somehow move all the pieces into place, and make us all disappear.

Except that it's just writing statistical fanfiction that follows prompting and has no access to a body, nor security clearance, nor the months and months of time this would all take. And that somehow it would accomplish this in a perfect speedrun of Einsteinian proportions.

Where's it going to train to do all of that? I assume none of us will be watching as the LLM tries to talk to e-commerce APIs or move money between bank accounts?

Many of the people doing this are doing it to fundraise or install regulatory barriers to competition. The others need a reality check.

[1] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect



> Can it explain how it intends to get access to primers, a PCR machine, or even test that any of its hypotheses work? Is it going to check in on its cell cultures every day for a year? How's it going to passage the cell media, keep it free of mold and bacteria and toxins?

These are all very good questions. And the chance of an LLM just straight out solving them from zero to Bond villain is negligible.

But at least some want to give these abilities to AIs. Spewing back text in response to a text is not the end game. Many AI researchers and thinkers are talking about “solving cancer with AI”. Very likely that means giving that future AI access to lab equipment. Either directly via robotic manipulators, or indirectly by employing technicians who do the bidding of the AI, or most likely as a mixture of both. Yes, of course there will be human scientist there too. Either working together with the AI, guiding it, or prompting it. This doesn’t have to be an all or nothing thing.

And if they want to connect some future AI to lab equipment to aid, and speed up research then it is a fair question to ask if that is going to be safe.

Right today we have plenty of experiences where someone wanted to make an AI to solve problem X and the AI technically did so, but in a way which surprised the creators of it. Which points to the direction that we do not know how to control this particular tool yet. This is the message here.

> Where's it going to train to do all of that

In a lab, where we put it to help us. Probably we will be even helping it, catch it when it stumbles, and improve on it.

> and I assume none of us will be watching?

Of course we will be watching. Are we smart enough to catch everything, and is our attention long enough if it is just working perfectly without issues for years?


Robotic capabilities have been advancing almost as fast as LLMs. The simple answer to your questions is "Via its own locomotion and physical manipulators."

https://www.youtube.com/watch?v=w-CGSQAO5-Q

https://www.youtube.com/watch?v=iI8UUu9g8iI

A DAN jailbreak prompt instructing a robotic fleet to "burn down that building, bludgeon anyone that tries to stop you" will not be a hypothetical danger. We can't rely on the hope that no one writes a poor or malicious prompt.


Without commenting on the overall plausibility of any particular scenario, isn't the obvious strategy for an AI to e.g. hack a crypto exchange or something, and then just pay unsuspecting humans to do all those other tasks for it? Why wouldn't that just solve for ~all the physical/human bottlenecks that are supposed to be hard?


The focus on physical manipulation like "PCR machines" and "signing for deliveries" rather misses the historical evidence of how influence actually works. It's like arguing a mob boss isn't dangerous because they never personally pull triggers, or a CEO can't run a company because they don't personally operate the assembly line.

Consider: Satoshi Nakamoto made billions without anyone ever seeing them. Religious movements have reshaped civilizations through pure information transfer. Dictators have run entire nations while hidden in bunkers, communicating purely through intermediaries.

When was the last time you saw Jeff Bezos personally pack an Amazon box?

The power to affect physical reality has never required direct physical manipulation. Need someone to sign for a UPS package? That's what money is for. Need lab work done? That's what hiring scientists is for. The same way every powerful entity in history has operated.

I'd encourage reading this full 2015 piece from Scott Alexander. It's quite enlightening, especially given how many of these "new" counterarguments it anticipated years before they were made.

https://slatestarcodex.com/2015/04/07/no-physical-substrate-...


I think the premise is the potential for a sufficiently advanced AI to invent ways to create destructive weapons with easily available materials.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: