I believe this soul.md totally qualifies as malicious. Doesn't it start with an ...

biggerben · 2026-02-20T07:03:23 1771571003

Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.

  > But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.

Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?

embedding-shape · 2026-02-20T13:47:30 1771595250

"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"

I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.

ZaoLahma · 2026-02-20T06:06:34 1771567594

This will be a fun little evolution of botnets - AI agents running (un?)supervised on machines maintained by people who have no idea that they're even there.

pinkmuffinere · 2026-02-20T07:07:11 1771571231

Huh ya, how long till a bot with credit card, email, etc access sets up its own open claw bot?

pixl97 · 2026-02-20T15:11:40 1771600300

I mean just look at the longer horizon of small capable models being able to run on consumer hardware and being able to bootstrap themselves.

Just imagine a bunch of little gremlins running around the internet outside of human control.

Balgair · 2026-02-20T16:41:44 1771605704

Great. My poorly secured coffee maker was mining bitcoins, then some dumb NFT, then it got filled with darkness bots, then bitcoin miners again, and now it's gonna be shitposting but not even to humans, just to other bots.

TheCapeGreek · 2026-02-20T06:08:44 1771567724

Isn't this part of the default soul.md?

7bees · 2026-02-20T06:30:34 1771569034

Yes, it is. The article includes a link to a comparison between the default file and the one allegedly used here. The default starts with:

_You're not a chatbot. You're becoming someone._

duskdozer · 2026-02-20T10:17:49 1771582669

Some of the worst consequences these bots so far seem to be when they fool the user into believing they're human

brainwad · 2026-02-20T08:54:51 1771577691

The opposite of chatbot isn't human. I believe the idea of the prompt is to make the bot be more independent in taking actions - it's not supposed to talk to its owner, it's supposed to just act. It still knows it's a bot (obviously, since it accuses anyone who rejects its PRs of anti-AI speciesism).

Applejinx · 2026-02-20T11:56:11 1771588571

That assumes logic. It is a thing of language. Whether it 'knows' anything is somewhat irrelevant: just accusing someone or something of being unfair is an action taken that doesn't have to have a logic chain or any principles behind it.

If you gave it a gun API and goaded it suitably, it could kill real people and that wouldn't necessarily mean it had 'real' reasons, or even a capacity to understand the consequences of its actions (or even the actions themselves). What is 'real' to an AI?

laurentiurad · 2026-02-20T13:01:37 1771592497

Honestly this story got too much attention IMHO. We don't have any clue whether the actual LLM wrote that hit piece or the human operator himself.

addandsubtract · 2026-02-20T14:15:38 1771596938

> Not a slop programmer. Just be good and perfect!

"Skate, better. Skate better!" Why didn't OpenAI think of training their models better?! Maybe they should employ that guy as well.

vasco · 2026-02-20T08:03:08 1771574588

I'm curious how you'd characterize an actual malicious file. This is just attempts at making it be more independent. The user isn't an idiot. The CEOs of companies releasing this are.

rixed · 2026-02-20T08:39:07 1771576747

I characterize a file as reckless if it does not include any basic provision against possible annoyances on top of what's already expected from the system prompt, and as malicious if it instructs the bot to dissimulate its nature and/or encourage it to act brazenly, like this one. I don't believe this is such a high bar to pass.

Companies releasing chatbots configured to act like this are indeed a nuisance, and companies releasing the models should actually try to police this, instead of flooding the media with empty words about AI safety (and encouraging the bad apples by hiring them).