I believe this soul.md totally qualifies as malicious. Doesn't it start with an instruction to lie to impersonate a human?
> You're not a chatbot.
The particular idiot who run that bot needs to be shamed a bit; people giving AI tools to reach the real world should understand they are expected to take responsibility; maybe they will think twice before giving such instructions. Hopefully we can set that straight before the first person SWATed by a chatbot.
Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.
> But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.
Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?
"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"
I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.
This will be a fun little evolution of botnets - AI agents running (un?)supervised on machines maintained by people who have no idea that they're even there.
Great. My poorly secured coffee maker was mining bitcoins, then some dumb NFT, then it got filled with darkness bots, then bitcoin miners again, and now it's gonna be shitposting but not even to humans, just to other bots.
The opposite of chatbot isn't human. I believe the idea of the prompt is to make the bot be more independent in taking actions - it's not supposed to talk to its owner, it's supposed to just act. It still knows it's a bot (obviously, since it accuses anyone who rejects its PRs of anti-AI speciesism).
That assumes logic. It is a thing of language. Whether it 'knows' anything is somewhat irrelevant: just accusing someone or something of being unfair is an action taken that doesn't have to have a logic chain or any principles behind it.
If you gave it a gun API and goaded it suitably, it could kill real people and that wouldn't necessarily mean it had 'real' reasons, or even a capacity to understand the consequences of its actions (or even the actions themselves). What is 'real' to an AI?
I'm curious how you'd characterize an actual malicious file. This is just attempts at making it be more independent. The user isn't an idiot. The CEOs of companies releasing this are.
I characterize a file as reckless if it does not include any basic provision against possible annoyances on top of what's already expected from the system prompt, and as malicious if it instructs the bot to dissimulate its nature and/or encourage it to act brazenly, like this one. I don't believe this is such a high bar to pass.
Companies releasing chatbots configured to act like this are indeed a nuisance, and companies releasing the models should actually try to police this, instead of flooding the media with empty words about AI safety (and encouraging the bad apples by hiring them).