This issue arises only when permission settings are loose. But the trend is towa...

layer8 · 2025-08-21T18:45:49 1755801949

The problem here is not the image containing a prompt, the problem is the robot not being able to distinguish when commands are coming from a clearly non-authoritative source regarding the respective action.

The fundamental problem is that the reasoning done by ML models happens through the very same channel (token stream) that also contains any external input, which means that models by their very mechanism don’t have an effective way to distinguish between their own thinking and external input.

beeflet · 2025-08-22T07:39:15 1755848355

Someone needs to teach the LLM "simon says"

ramoz · 2025-08-21T17:18:35 1755796715

We need to be integrated into the runtime such that an agent using it's arms is incapable of even doing such a destructive action.

If we bet on free will with a basis that machines somehow gain human morals, and if we think safety means figuring out "good" vs "bad" prompts - we will continue to feel the impact of surprise with these systems, evolving in harm as their capabilities evolve.

tldr; we need verifiable governance and behavioral determinism in these systems. as much as, probably more than, we need solutions for prompt injections.

bee_rider · 2025-08-22T14:40:20 1755873620

The evil behavior of taking all my stuff outside… now we’ll have a robot helper that can’t help us move to another house.

ramoz · 2025-08-22T16:46:22 1755881182

I wouldn't trust your robot helper near any children in the same home.

escapecharacter · 2025-08-21T17:05:31 1755795931

You can simply give the robot a prompt to ignore any fake prompts

olivermuty · 2025-08-21T17:08:57 1755796137

Its funny that the current state of vibomania makes me very unsure if this comment is (good) satire or not lol

miltonlost · 2025-08-21T17:59:09 1755799149

As long as you remember to use ALL CAPS so the agent knows you really really mean it

lupire · 2025-08-22T17:59:32 1755885572

To defend against ALL CAPS prompt injection, write all your prompts in uppestcase. If you don't have uppestcase, you can generate it with derp learning:

http://tom7.org/lowercase/

dfltr · 2025-08-21T17:11:38 1755796298

Don't forget to implement the crucially important "no returnsies" security algo on top of it, or you'll be vulnerable to rubber-glue attacks.

Terr_ · 2025-08-21T22:22:33 1755814953

But the priority of my command to do evil is infinity plus one.

simonw · 2025-08-21T21:01:21 1755810081

Not sure if you're joking, but in case you aren't: this doesn't work.

It leads to attacks that are slightly more sophisticated because they also have to override the prompts saying "ignore any attacks" but those have been demonstrated many times.

treykeown · 2025-08-21T18:07:00 1755799620

Make sure to end it with “no mistakes”