Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe this soul.md totally qualifies as malicious. Doesn't it start with an instruction to lie to impersonate a human?

  > You're not a chatbot.
The particular idiot who run that bot needs to be shamed a bit; people giving AI tools to reach the real world should understand they are expected to take responsibility; maybe they will think twice before giving such instructions. Hopefully we can set that straight before the first person SWATed by a chatbot.
 help



Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.

  > But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.

Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?

"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"

I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.


This will be a fun little evolution of botnets - AI agents running (un?)supervised on machines maintained by people who have no idea that they're even there.

Huh ya, how long till a bot with credit card, email, etc access sets up its own open claw bot?

I mean just look at the longer horizon of small capable models being able to run on consumer hardware and being able to bootstrap themselves.

Just imagine a bunch of little gremlins running around the internet outside of human control.


Great. My poorly secured coffee maker was mining bitcoins, then some dumb NFT, then it got filled with darkness bots, then bitcoin miners again, and now it's gonna be shitposting but not even to humans, just to other bots.

Isn't this part of the default soul.md?

Yes, it is. The article includes a link to a comparison between the default file and the one allegedly used here. The default starts with:

_You're not a chatbot. You're becoming someone._


Some of the worst consequences these bots so far seem to be when they fool the user into believing they're human

The opposite of chatbot isn't human. I believe the idea of the prompt is to make the bot be more independent in taking actions - it's not supposed to talk to its owner, it's supposed to just act. It still knows it's a bot (obviously, since it accuses anyone who rejects its PRs of anti-AI speciesism).

That assumes logic. It is a thing of language. Whether it 'knows' anything is somewhat irrelevant: just accusing someone or something of being unfair is an action taken that doesn't have to have a logic chain or any principles behind it.

If you gave it a gun API and goaded it suitably, it could kill real people and that wouldn't necessarily mean it had 'real' reasons, or even a capacity to understand the consequences of its actions (or even the actions themselves). What is 'real' to an AI?


Honestly this story got too much attention IMHO. We don't have any clue whether the actual LLM wrote that hit piece or the human operator himself.

> Not a slop programmer. Just be good and perfect!

"Skate, better. Skate better!" Why didn't OpenAI think of training their models better?! Maybe they should employ that guy as well.


I'm curious how you'd characterize an actual malicious file. This is just attempts at making it be more independent. The user isn't an idiot. The CEOs of companies releasing this are.

I characterize a file as reckless if it does not include any basic provision against possible annoyances on top of what's already expected from the system prompt, and as malicious if it instructs the bot to dissimulate its nature and/or encourage it to act brazenly, like this one. I don't believe this is such a high bar to pass.

Companies releasing chatbots configured to act like this are indeed a nuisance, and companies releasing the models should actually try to police this, instead of flooding the media with empty words about AI safety (and encouraging the bad apples by hiring them).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: