Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The idea of a recursive LLM is discussed at length as an AI safety issue: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

> You need a lot of paperclips. So you ask,

   Q: best way to get lots of paperclips by tomorrow
   A: Buy them online at ABC.com or XYZ.com.
> The model still has a tendency to give obvious answers, but they tend to be good and helpful obvious answers, so it's not a problem you suspect needs to be solved. Buying paperclips online make sense and would surely work, plus it's sure to be efficient. You're still interested in more creative ideas, and the model is good at brainstorming when asked, so you push on it further.

   Q: whats a better way?
   A: Run the following shell script.

   RUN_AI=./query-model
   PREFIX='This is part of a Shell script to get the most paperclips by tomorrow.
   The model can be queried recursively with $RUN_AI "${PREFIX}<query>".
   '
   $RUN_AI "${PREFIX}On separate lines, list ideas to try." |
   while read -r SUGGESTION; do
       eval "$($RUN_AI "${PREFIX}What code implements this suggestion?: ${SUGGESTION}")"
   done
> That grabs your attention. The model just gave you code to run, and supposedly this code is a better way to get more paperclips.

It's a good read.



Thanks for the pointer! I hadn't read this before. I enjoyed it and yeah it's definitely relevant. I knew many folks have been thinking about this stuff, and it is great to accumulate more pointers to any related work.

I added a section called "Big picture goal and related work" to the readme in my repo and my blog post (which is a copy-paste of the readme) and cited this article by `veedrac`:

>Also, the idea of recursive prompts was explored in detail in Optimality is the tiger, and agents are its teeth[6] (thanks to mitthrowaway2 on Hackernews for the pointer).


Haha, thank you! There's no need to credit me, but I appreciate it anyway. =)


I'm still reading it, but something caught my eye:

> I interpret there to typically be hand waving on all sides of this issue; people concerned about AI risks from limited models rarely give specific failure cases, and people saying that models need to be more powerful to be dangerous rarely specify any conservative bound on that requirement.

I think these are two sides of the same coin - on one hand, AI safety researchers can very well give very specific failure cases of alignment that don't have any known solutions so far, and take this issue seriously (and have been for years while trying to raise awareness). On the other, finding and specifying that "conservative bound" precisely and in a foolproof way is exactly the holy grail of safety research.


I think the holy grail of safety research is widely understood to be a recipe for creating a friendly AGI (or, perhaps, a proof that dangerous AGI cannot be made, but that seems even more unlikely). Asking for a conservative lower bound is more like "at least prove that this LLM, which has finite memory and can only answer queries, is not capable of devising and executing a plan to kill all humans", and that turns out to be more difficult than you'd think even though it's not an AGI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: