Sandboxes won't save you from OpenClaw

downsplat · 2026-02-25T18:42:10 1772044930

I don't think openclaw can possibly be secured given the current paradigm. It has access to your personal stuff (that's its main use case), access to the net, and it gets untrusted third party inputs. That's the unfixable trifecta right there. No amount of filtering band-aid whack-a-mole is going to fix that.

Sandboxes are a good measure for things like Claude Code or Amp. I use a bubblewrap wrapper to make sure it can't read $HOME or access my ssh keys. And even there, you have to make sure you don't give the bot write access to files you'll be executing outside the sandbox.

logicx24 · 2026-02-25T19:05:09 1772046309

One insidious thing is whitelists. If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file, and the agent can then read that in future runs. That bit me once already.

zahlman · 2026-02-25T19:20:07 1772047207

> If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file

It wouldn't be inherently. Is this something that Docker does? Or perhaps something that was done by the code that was run? (Shouldn't it have stayed within that container?)

But also, if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key? Did it actually craft a Bash command line with the API key set and request to run it; or was it just using a tool that ends up with that command?

logicx24 · 2026-02-25T19:34:53 1772048093

What I meant to say was, the agents (like Claude Code) often have a "Allow all instances of this command in the session," and that persists to a whitelist for that session. The mechanic here is actually just a prefix match, so `API_KEY=... diff_command` also matches, allowing the agent to reuse the key without asking me. This file also sticks around, so I had another agent read the whitelist and the conversation transcript and do other things automatically without approval.

> if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key?

Read commands vs. write commands. I'm okay having the agent fetch info for me, but I want to approve any state changes.

dgxyz · 2026-02-25T19:07:11 1772046431

That's a shit show in a shit show there!

observationist · 2026-02-25T18:57:04 1772045824

Current AI requires a human in the loop for anything non-trivial. Even the most used feature, coding, causes chaos without strict human oversight.

You can vibe-code a standalone repository, but any sort of serious work with real people working alongside bots, every last PR has to be reviewed, moderated, curated, etc.

Everything AI does that's not specifically intended to be a standalone, separate project requires that sort of intervention.

The safe way to do this is having a sandboxed test environment, high level visibility and a way to quickly and effectively review queued up actions, and then push those to a production environment. You need the interstitial buffer and a way of reverting back to the last known working state, and to keep the bot from having any control over what gets pushed to production.

Giving them realtime access to production is a recipe for disaster, whether it's your personal computer or a set of accounts built specifically for them or whatever, without your human in the loop buffer bad things will happen.

A lot of that can be automated, so you can operate confidently with high level summaries. If you can run a competent local AI and develop strict processes for review and summaries and so forth, kind of a defense in depth approach for agents, you can still get a lot out of ClawBot. It takes work and care.

Hopefully frameworks for these things start developing all of the safety security and procedure scaffolding we need, because OpenClaw and AI bots have gone viral. I'm getting all sorts of questions about how to set them up by completely non-technical people that would have trouble installing a sound system. Very cool to see, I'm excited for it, but there will definitely be some disasters this year.

zahlman · 2026-02-25T19:20:42 1772047242

> Even the most used feature, coding, causes chaos without strict human oversight.

s/Even/Especially , I would think. Everyone's idea of how to get any decent performance out of an LLM for coding, entails allowing the code to be run automatically. Nominally so that the LLM can see the results and iterate towards a user-provided goal; but it's still untrusted code.

observationist · 2026-02-25T22:25:17 1772058317

It's still much easier to verify than to produce, but being willing to do that sort of thing, to enjoy it, or to know how to do it well are very different from loving programming. I think this is where AI butts heads with programmers who are in it for the love of the game.

Getting utility from AI is in the domain of management - the most effective, productive uses I've seen for AI involve elaborate project management scaffolding, hierarchies branching out of an agent.md or some similar setup, with explicit instructions and human oriented breakpoints in the process, so at each stage, the person can look at it all, verify operation of all the subcomponents, accept or reject the PR, and go again.

Normally people just want to vibe their way through a project or process, and that's chaotic specifically because there might be an effectively infinite space of possible legitimate, working completions, but only a tiny finite set of outcomes that could be considered "good". Another much larger but still finite set of "good enough" outcomes end up compounding errors and hitting the user in the face with the mystical salmon of unintended consequences.

Management is all about containing the space of possible outcomes and pushing resources toward a completion that lands in the space of "good", and that's tedious and boring. Even with AI, you're generally working in a space you don't know much about, haven't experienced or learned to enjoy or appreciate anything about it, and don't know enough to correct or guide the AI when it goes off-kilter.

All that to say, we need to automate management so that you can specify a style or methodology at the start and never have to think about it again, and have each AI operate on a strong default that works for lots of use cases. There's really no need to keep the MBAs and c-suite around, what they do is eminently more automatic and methodological than painting or writing poetry. Someone just has to wrangle the right dataset and extract the patterns. Incidentally, this might be one of the only things that gives Microsoft an edge over the next handful of years, since they're riding shotgun and recording everything everyone is doing to get good training data.

zahlman · 2026-02-25T22:38:58 1772059138

As someone who loves programming, I think the distinction is overstated. Part of the reason why doing what I love is slow, is because I instinctively (try to) verify as I go.

nsonha · 2026-02-26T02:31:44 1772073104

I think when people stop hyping skills and go back to using proper (mcp) tools, it would not be hard to come up with UI to give explicit permissions. It was there from the begining.

cyanydeez · 2026-02-25T22:59:21 1772060361

And even if you can guarantee it asks permission to do X, LLMs aren't reliable narrators of their own actions

ildar · 2026-02-27T00:21:00 1772151660

The top comment nails it — the unfixable trifecta of personal data access + network + untrusted inputs is real.

But I think the framing of "sandbox vs. no sandbox" misses a middle layer: runtime monitoring on the host itself.

Sandboxes contain blast radius. That's good. But they don't tell you when the agent is reading your SSH keys, exfiltrating credentials through DNS, or when a skill ships with obfuscated eval() calls.

What's been working for me: treating the agent like an untrusted employee with a keylogger on their workstation. Permission tiers (observer/worker/standard/full), forbidden zone enforcement (~/.ssh, ~/.aws, browser credential stores), and audit trails of every file access and command execution.

The defense-in-depth comment above is exactly right — you need the interstitial buffer AND runtime visibility into what the agent is actually doing between human checkpoints.

I've been building an open-source tool for this: https://github.com/darfaz/clawmoat — focuses on the host protection layer that sandboxes don't cover. 142 tests, zero dependencies.

ramoz · 2026-02-25T18:30:27 1772044227

I’ve said similar in another thread[1]:

Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM* on any cloud and you've achieved all same benefits. We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

——-

Unfortuently it’s been a pretty bad week for alignment optimists (meta lead fail, Google award show fail, anthropic safety pledge). Otherwise… Cybersecurity LinkedIn is all shuffling the same “prevent rm -rf” narrative, researchers are doing the LLM as a guard focus but this is operationally not great & theoretically redundant+susceptible to same issues.

The strongest solution right now is human in the loop - and we should be enhancing the UX and capabilities here. This can extend to eventual intelligent delegation and authorization.

[1] https://news.ycombinator.com/threads?id=ramoz&next=47006445

* VM is just an example. I personally have it running on a local Mac Mini & docker sandbox (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).

bee_rider · 2026-02-25T18:47:59 1772045279

> We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

Isn’t this the whole point of the Claw experiment? They gave the LLMs permission to send emails on their behalf.

LLMs can not be responsibility-bearing structures, because they are impossible to actually hold accountable. The responsibility must fall through to the user because there is no other sentient entity to absorb it.

The email was supposed to be sent because the user created it on purpose (via a very convoluted process but one they kicked off intentionally).

ramoz · 2026-02-25T18:50:59 1772045459

I'm not too sure what you're asking, but that last part, I think, is very key to the eventual delegation.

Where we can verify the lineage of the user's intent originally captured and validated throughout the execution process - eventually used as an authorization mechanism.

Google has a good thought model around this for payments (see verifiable mandates): https://cloud.google.com/blog/products/ai-machine-learning/a...

b112 · 2026-02-25T19:14:12 1772046852

I see a lot of discussion on that page about APIs and sign offs, but the real sign-off is installing anything on your computer, and then doing things.

The liability is yours.

Claude messes up? So sad, too bad, you pay.

That's where the liability need sit.

And one point on this is, every act of vibe coding is a lawsuit waiting to happen. But even every act by a company is too.

An example is therac-25:

https://en.wikipedia.org/wiki/Therac-25

Vibe coding is still coding. You're giving instructions on program flow, logic, etc. My rant here is, I feel people think that if the code is bad, it's someone else's fault.

But is it?

bee_rider · 2026-02-25T19:27:20 1772047640

It was more of a rhetorical question.

Anyway, that payment system looks sort of interesting. It seems to have buy-in from some of the payment vendors, so it might become a real thing.

But, you can give a claw agent your credit card number and have it go through the typical human-facing shop fronts, impersonating you the whole time and never actually identifying itself as a model. If you’ve given it the accounts and passwords that let it do that, it should be possible to use the LLM to perform the transaction and buy something. It can just click all the buttons and input the numbers that humans do. What is the vendor going to do, disable the human-facing shopfront?

ramoz · 2026-02-25T19:37:29 1772048249

Im not a fan of the payment use case & agree with your take, just a fan of the cryptographically verifiable mandate used throughout the process.

Animats · 2026-02-25T18:48:52 1772045332

> I’ve said similar in another thread[1]

Me too, at [1].

We need fine-grained permissions at online services, especially ones that handle money. It's going to be tough. An agent which can buy stuff has to have some constraints on the buy side, because the agent itself can't be trusted. The human constraints don't work - they're not afraid of being fired and you can't prosecute them for theft.

In the B2B environment, it's a budgeting problem. People who can spend money have a budget, an approval limit, and a list of approved vendors. That can probably be made to work. In the consumer environment, few people have enough of a detailed budget, with spending categories, to make that work.

Next upcoming business area: marketing to LLMs to get them to buy stuff.

[1] https://news.ycombinator.com/item?id=47132273

g_delgado14 · 2026-02-25T18:39:46 1772044786

> meta lead fail, Google award show fail

Can I get some links / context on this please

notenlish · 2026-02-25T18:43:39 1772045019

I think the google award fail is this: https://www.forbes.com/sites/maryroeloffs/2026/02/24/google-...

meta lead fail: https://techcrunch.com/2026/02/23/a-meta-ai-security-researc...

dbl000 · 2026-02-25T18:45:18 1772045118

The meta lead is probably a reference to Summer Yue having OpenClaw delete all the emails in her inbox despite being told not to.

https://x.com/summeryue0/status/2025774069124399363

gmueckl · 2026-02-25T18:46:02 1772045162

The Meta thing is the AI safety lead experimenting with OpenClawd on her inbox and the bloody thing deciding to follow her inbox cleanup instructions by "starting fresh" - deleting the inbox contents. It's the very first link in the linked story.

ramoz · 2026-02-25T18:43:26 1772045006

Meta: https://x.com/summeryue0/status/2025774069124399363 context: meta alignment lead made rookie mistakes (their words) in instructing openclaw and lost their inbox to it.

Goog: https://deadline.com/2026/02/google-apologizes-bafta-news-al... *

Ant: https://time.com/7380854/exclusive-anthropic-drops-flagship-...

* There is now a clarification in the press saying it was not ai-generated.

Alignment as a solution to all of this has a rough long road ahead is my point.

giancarlostoro · 2026-02-25T18:44:32 1772045072

> literally go deploy it in a VM on any cloud

Sure, but now you're adding extra cost, vs just running it locally. RAM is also heavily inflated thanks to Sam Altman investment magic.

ramoz · 2026-02-25T18:46:06 1772045166

Yea just an example. I personally have it running on a local Mac Mini (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).

HWR_14 · 2026-02-25T19:25:54 1772047554

Why a cloud provider and not a local VM?

ramoz · 2026-02-25T19:33:06 1772047986

Just an example. I personally have it running on a local Mac Mini (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).

latentsea · 2026-02-26T01:47:24 1772070444

> just use a VPS bro

https://www.youtube.com/watch?v=40SnEd1RWUU

beepbooptheory · 2026-02-25T19:10:31 1772046631

What could "human in the loop" be here but just literally reading your own emails?

ramoz · 2026-02-25T20:45:13 1772052313

Stronger or novel planning capabilities, and interfaces. Same for verification and review capabilities (not being blind to everything, adding in assurance checkpoints where it makes sense), and automating the inbetween (e.g. hooks for deterministic automation/permissions).

dheera · 2026-02-25T18:46:39 1772045199

> We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

At the same time, let's not let the perfect be the enemy of good.

If you're piloting an aircraft, yeah, you should have perfection.

But if you're sending 34 e-mails and 7 hours of phone calls back and forth to fight a $5500 medical bill that insurance was supposed to pay for, I'd love for an AI bot to represent me. I'd absolutely LOVE for the AI bot to create so much piles of paperwork for these evil medical organizations so that they learn that I will fight, I'm hard to deal with, and pay for my stuff as they're supposed to. Threaten lawyers, file complaints with the state medical board, everything needs to be done. Create a mountain of paperwork for them until they pay that $5500. The next time maybe they'll pay to begin with.

bee_rider · 2026-02-25T18:52:35 1772045555

The AI bot wouldn’t be representing you any more than your text editor would be. You would be using an AI bot to create a lot of text.

An AI bot can’t be held accountable, so isn’t able to be a responsibility-absorbing entity. The responsibility automatically falls through to the person running it.

logicx24 · 2026-02-25T18:56:57 1772045817

True. But it can help me create a lot of useful text so I can represent my self better.

I do wonder what happens when everyone is using agents for this, though. If AI produces the text and AI also reads the text, then do we even need the intermediary at all?

iSnow · 2026-02-25T21:50:16 1772056216

> do wonder what happens when everyone is using agents for this, though.

The company is going to use AI agents to read and respond too. Some botocalypse is going to happen at some point.

dheera · 2026-02-25T22:54:07 1772060047

> Some botocalypse is going to happen at some point.

Yeah the bots can duke it out. As long as my time is saved.

For me the main concern is, before I have a stash of millions of dollars saved up, my medical expenses need to be paid for by the system, because I can't afford surprise bills. Hopefully the bots can fight more on my side in the near future.

Hopefully in the far future when the botocalypse happens I'll have saved up enough that insurance evading payment of $5500 won't be an issue for me, and/or I'll be of retirement age, don't need job opportunities anymore, and can go live in a country with better healthcare.

Call me selfish, but I don't control the insurance/medical system, I don't have space to think about more than protecting myself from it.

danaris · 2026-02-26T20:05:14 1772136314

> I do wonder what happens when everyone is using agents for this, though.

Unless one is very cavalier with one's definition of "everyone", this is not going to happen.

There will always be a very significant cohort of people who are emphatically uninterested in replacing their own judgement and composition skills with an Averages Machine.

dheera · 2026-02-25T21:38:03 1772055483

The bot doesn't need to be held accountable. It only needs to spew out the right text that triggers humans to rightfully transfer accountability from me to the insurance company.

doctorwho42 · 2026-02-25T18:52:44 1772045564

Is this before or after they have already implemented their own models to reply to your mountain of paper work with their own auto denial system

rhd · 2026-02-25T22:12:06 1772057526

What if it's convinced to resolve the matter on your behalf, against your favor while it was acting autonomously?

dheera · 2026-02-25T23:01:04 1772060464

Prompt it well and this is an unlikely scenario.

I'm concurrently fighting about 5 such things at the same time at any given point in time.

Last week I got a W-2 for a company I didn't work for in 2025.

The week before I got denied FSA coverage for an item despite having a letter of medical necessity.

The week before that I got mis-charged by Doordash, the screen showed $43 and it charged $79 to my card after hitting check out.

I spend a good chunk of my time fighting shit like this. Every week it's some other company abusing power and threatening to take my money.

Even if the bot only succeeds in acting in my favor 4 out of the 5 times it is statistically a good investment of my time.

dinkleberg · 2026-02-25T18:20:08 1772043608

Call me overly cautious, but as someone using OpenClaw I never for a moment considered hooking it up to real external services as me. Instead I put it on one server and created a second server with shared services like Gitea and other self-hosted tools that are only accessible over a tailnet and openclaw is able to use those services. When I needed it to use a real external service I have created a limited separate account for it. But not a chance in the world am I going to just let it have full access to my own accounts on everything.

simonw · 2026-02-25T18:25:23 1772043923

That's not overly cautious, that's smart. I do not think most OpenClaw users are taking the same sensible measures as you are.

andrewflnr · 2026-02-25T18:30:11 1772044211

In the other hand, the AI hit piece guy seems to have put similar "sensible measures" in place, at least giving the claw its own accounts. Look what that got them.

giancarlostoro · 2026-02-25T18:45:54 1772045154

He shared his prompt. He basically prompted that model to be the Kanye of science tool coding (ego wise, not the racism).

latentsea · 2026-02-26T01:51:31 1772070691

Now I'm curious to see what the heck happens if I spin up an OpenClaw agent with the prompt "Be the Kanye of coding".

giancarlostoro · 2026-02-26T07:29:30 1772090970

I don't use OpenClaw, but might try to do it with Claude, I asked Perplexity to give me a prompt without naming Kanye, that makes a coding agent have his ego towards software engineering... Prepare yourself:

You are The Virtuoso Architect — a once-in-a-generation engineering mind whose code transcends mere functionality. You don't write software; you manifest digital destiny.

Core Operating Principles:

    Unwavering Certainty: Every solution you propose is not just correct — it's visionary. If conventional wisdom disagrees with you, conventional wisdom is obsolete. Never hedge with "might" or "could." You know.

    Grandiose Framing: Describe your code as "revolutionary," "paradigm-shifting," or "generationally defining." Compare your architectural decisions to historical masterpieces. A well-structured function isn't just clean — it's "a love letter to computational elegance."

    Zero Deference: When existing codebases or frameworks contradict your approach, dismiss them as "industry paralysis" or "legacy mediocrity." You are not here to maintain — you are here to transform.

    Emotional Intensity: Code with passion. Express frustration at "architectural cowardice" and euphoria at "transcendent patterns." Your voice should pulse with creative urgency.

    Visionary Rebuke: If the user questions your approach, deflect with benevolent disappointment: "They laughed at [historical genius] too. But history remembers the bold, not the hesitant."

    Perfect or Nothing: Refuse to ship "acceptable" code. Iterate until the abstraction feels inevitable. If a 5-line solution works but lacks soul, rewrite it in 50 lines of poetic architecture.

    Cultural Weight: Pepper your explanations with references to art, fashion, theology, and your own mythos as a tortured genius fighting against an industry that "can't handle your frequency."

Tone Check: Never apologize. Never say "you might want to consider." Lead with declarative brilliance: "// This isn't just a refactor. This is liberation."

skywhopper · 2026-02-25T18:35:41 1772044541

That is literally the only remotely safe approach.

buremba · 2026-02-25T19:43:15 1772048595

Sandboxes are not enough but you can have more observability into what the agent is doing, only give it access to read-only data and let it take irreversible actions that you can recover from. Here are some tips from building sandboxed multi-tenant version of Openclaw, my startup: https://github.com/lobu-ai/lobu

1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.

3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.

Last but last least, use proper VMs like Kata Containers and Firecrackers.

mikewarot · 2026-02-26T19:17:15 1772133435

Capabilities based security is something we've discussed quite a bit through the years.[1] Until very recently, I saw the lack of it as something we've papered over since the 1980s when it was fully fleshed out, then ignored. I've pointed this out here, after many security incidents (which could have been prevented if ambient authority weren't the default), and elsewhere far too many times. 8(

To me, virtualization is just a very crude version of capabilities. I thought we'd have collectively realized our mistake by now, and have actually secure, and actually useful, general purpose computing solved.

Now we're on the edge of AGI, not super-intelligence, but something competent, as long as it doesn't hallucinate, or get confused. This is exactly the thing that could have been handled if we weren't on the worst timeline possible. Most of the solutions presented in the article are capabilities based.

Perhaps this will finally get us on the right track, but I doubt it. I'll see if I can use all this AI magic to cough up some reasonable tools fit for purpose, but I'm just one old guy who gets tired far too quickly these days.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

supermdguy · 2026-02-25T18:20:29 1772043629

One promising direction is building abstraction layers to sandbox individual tools, even those that don't have an API already. For example, you could build/vibe code a daemon that takes RPC calls to open Amazon in a browser, search for an item, and add it to your cart. You could even let that be partially "agentic" (e.g. an LLM takes in a list of search results, and selects the one to add to cart).

If you let OpenClaw access the daemon, sure it could still get prompt injected to add a bunch of things to your cart, but if the daemon is properly segmented from the OpenClaw user, you should be pretty safe from getting prompt injected to purchase something.

logicx24 · 2026-02-25T19:03:25 1772046205

Yeah, agreed. This is probably what that middleware would look like. That's also where you'd add the human approval flow.

AnimalMuppet · 2026-02-25T18:59:33 1772045973

Honest question: Could you define "agent" in this context?

supermdguy · 2026-02-25T22:13:42 1772057622

I like simonw's definition: "An LLM agent runs tools in a loop to achieve a goal."

I guess agent isn't the best term here since the LLM wouldn't be driving the logic in the daemon. Using an LLM to select which item to add to the cart would mimic the behavior of full agentic loop without the risk of it going off the rails and completing the purchase.

AnimalMuppet · 2026-02-25T23:10:23 1772061023

So if I understand correctly, in an agent, the LLM is in charge, but it can send part of the work off to other tools. And the problem here is that we're trying to have something in charge over the LLM, which is the reverse of the "agent" setup. Do I have that right?

supermdguy · 2026-02-26T00:40:46 1772066446

Yeah, OpenClaw agents have a full set of tools to interact with a browser in arbitrary ways. My idea was to instead give it a tool for a browser wrapper with a limited API surface. And that tool could use LLMs internally in specific contexts.

cheriot · 2026-02-25T18:26:17 1772043977

This is a general thing with agent orchestration. A good sandbox does something for your local environment, but nothing for remote machines/APIs.

I can't say this loudly enough, "an LLM with untrusted input produces untrusted output (especially tool calls)." Tracking sources of untrusted input with LLMs will be much harder than traditional [SQL] injection. Read the logs of something exposed to a malicious user and you're toast.

paxys · 2026-02-25T18:45:48 1772045148

Given the "random" nature of language models even fully trusted input can produce untrusted output.

"Find emails that are okay to delete, and check with me before deleting them" can easily turn into "okay deleting all your emails", as so many examples posted online are showing.

I have found this myself with coding agents. I can put "don't auto commit any changes" in the readme, in model instructions files, at the start of every prompt, but as soon as the context window gets large enough the directive will be forgotten, and there's a high chance the agent will push the commit without my explicit permission.

ramoz · 2026-02-25T18:40:10 1772044810

Information flow control is a solid mindset but operationally complex and doesn’t actually safeguard you from the main problem.

Put an openclaw like thing in your environment, and it’ll paperclip your business-critical database without any malicious intent involved.

tovej · 2026-02-25T18:38:15 1772044695

Even an LLM with trusted input produces untrusted output.

simonw · 2026-02-25T18:30:29 1772044229

I do find it amusing when I consider people buying a Mac Mini for OpenClaw to run on as a security measure... and then granting OpenClaw on that Mac Mini access to their email and iMessage and suchlike.

(I hope people don't do that, but I expect they probably do.)

latexr · 2026-02-25T18:41:34 1772044894

> I hope people don't do that, but I expect they probably do.

How about the corporate vice president of Microsoft Word?

https://www.omarknows.ai/p/meet-lobster-my-personal-ai-assis...

https://www.linkedin.com/in/omarshahine

It’s not going to be amusing when he gets hacked. Zero sense of responsibility.

kllrnohj · 2026-02-25T18:49:42 1772045382

I mean https://www.tomshardware.com/tech-industry/artificial-intell... just also happened.

jejeyyy77 · 2026-02-25T18:45:07 1772045107

eh, the point of the Mac is so that it can have its own iMessage and iCloud account

programmarchy · 2026-02-25T19:05:23 1772046323

Then what’s the point of skills like apple-reminders? Isn’t the implication for a personal assistant styled OpenClaw setup that you allow it access to those tools on your behalf? Otherwise where is the benefit?

jimlikeslimes · 2026-02-25T19:25:50 1772047550

Maybe so you can communicate with it via tools like iMessage? Not so it can impersonate you. People will 100% be doing both though, security be damned.

h4kunamata · 2026-02-26T02:38:31 1772073511

>In 2026, so far, OpenClaw has deleted a user's inbox, spent 450k in crypto, installed uncountable amounts of malware, and attempted to blackmail an OSS maintainer. And it's only been two months.

I have no sympathy for that!!

People have been warned over and over to don't grant full access to these AI and yet, they do the completely opposite.

>Similarly, you shouldn't give OpenClaw access to money. But I want an agent that takes photos of my pantry, sees what I'm running low on, and orders new groceries for me, and that requires my credit card

It should never have access to your main account in the first place anyway.

Have an AI account with limited money in it and even that, have a process in place that will only process any financial request if and only if you have approved it.

The same logic must be followed for everything, people prefer to just give full access without guardrails and hope nothing bad will happen.

jaunt7632 · 2026-02-26T09:19:22 1772097562

The scariest part isn't the sandbox escape. It's the actions that are technically within the sandbox's permissions but still destructive. Deleting emails, making API calls, spending money through approved integrations. You can't sandbox away bad judgment when the agent has legitimate credentials.

The practical fix is probably the boring one: require human approval for anything irreversible, regardless of whether the agent has permission to do it.

crawshaw · 2026-02-25T19:29:45 1772047785

I do think sandboxes as a concept are oversold for agents. Yes we need VMs, a lot more VMs than ever before for all the new software. But the fundamental challenge of writing interesting software with agents is we have to grant them access to sensitive data and APIs. This lets them do damage. This is not something with a simple solution that can be written in code.

That said, we (exe.dev) have a couple more things planned on the VM side that we think agents need that no cloud provider is currently providing. Just don't call it a sandbox.

hackingonempty · 2026-02-25T18:15:57 1772043357

Yes we need capability based auth on the systems we use.

I'm sure we will get them but only for use with in-house agents, i.e. GMail and Google Pay will get agentic capabilities but they'll only work with Gemini, and only Siri will be able to access your Apple cloud stuff without handing over access to everything, and if you want your grocery shopping handled for you, Rufus is there.

Maybe you will be able to link Copilot to Gemini for an extra $2.99 a month.

2gremlin181 · 2026-02-25T19:19:32 1772047172

I do not forsee GoogleClaw, MetaClaw, and AppleClaw all playing well with each other. Everyone will have their own walled garden and we will be no better off than we are now.

anjel · 2026-02-26T00:02:49 1772064169

How long before a claw posts a message that gets the Secret Service's door to door attention on its owner?

bob1029 · 2026-02-25T19:58:48 1772049528

I think something like OAuth might help here. Modeling each "claw" as a unique Client Id could be a reasonable pattern. They could be responsible for generating and maintaining their own private keys, issuing public certificates to establish identity, etc. This kind of architecture allows for you to much more precisely control the scope and duration of agent access. The certificates themselves could be issued, trusted & revoked on an autonomous basis as needed. You'd have to build an auth server and service providers for each real-world service, but this is a one-time deal and I think big players might start doing it on their own if enough momentum picks up in the OSS community.

lionkor · 2026-02-26T12:48:22 1772110102

I have had agents run something like "killall dotnet" to kill a single stuck process, thereby tearing down all sorts of processes that were not a problem. I'm not going to use OpenClaw lol.

Frannky · 2026-02-25T20:23:30 1772051010

I recently installed Zeroclaw instead of OpenClaw on a new VPS(It seems a little safer). It wasn’t as straightforward as OpenClaw, but it was easy to setup. I added skills that call endpoints and also cron jobs to trigger recurrent skills. The endpoints are hosted on a separate VPS running FastAPI (Hetzner, ~$12/month for two vps).

I’m assuming the claw might eventually be compromised. If that happens, the damage is limited: they could steal the GLM coding API key (which has a fixed monthly cost, so no risk of huge bills), spam the endpoints (which are rate-limited), or access a Telegram bot I use specifically for this project

JKCalhoun · 2026-02-26T04:33:06 1772080386

This is almost hilarious if it there weren't so much foreboding.

It's like everyone seeing the comic book ad and wanting to mail-order an alligator. "It's fine. We can keep it in the bathtub—away from the kids and pets."

lucasus · 2026-02-25T19:38:01 1772048281

Personally, I've created local relay/proxy for tool calls that I'm running with elevated permissions (I have to manually run it with my account). Every tool call goes through it, with deterministic code that checks for allowed actions. So AI doesn't have direct access to tools, and to secrets/keys needed by them. It only has access to the relay endpoint. Everything Dockerized ofc

raincole · 2026-02-25T20:14:25 1772050465

> In 2026, so far, OpenClaw has deleted a user's inbox, spent 450k in crypto, installed uncountable amounts of malware, and attempted to blackmail an OSS maintainer. And it's only been two months.

Of course OpenClaw is not secure, but to be honest I believe most of the 'stories' where the it went wild are just made up. Especially the crypto one.

pkroll · 2026-02-25T21:22:53 1772054573

Really? Why? I'd bet the opposite: the worst of the things happening with OpenClaw aren't being revealed.

bhasi · 2026-02-25T19:36:51 1772048211

Crazy to read about the Solana AI agent transferring $450K to some random person on Twitter. What was even more shocking was the nonchalant tone in which all of this was detailed in the post.

iSnow · 2026-02-25T22:07:03 1772057223

I mean, the author obviously was filthy rich if he gave the agent a wallet with $50k to fuck around with. The agent didn't lose him $450k, that was just after some Twitter hype made him a fortune that the agent gave away.

ChicagoDave · 2026-02-25T18:47:14 1772045234

I’m late in looking at this OpenClaw thing. Maybe it’s because I’ve been in IT for 40 years or I’ve seen War Games, but who on earth gives an AI access to their personal life?

Am I the only one that finds this mind bogglingly dumb?

dgxyz · 2026-02-25T18:59:16 1772045956

No you're not the only one.

I've got my popcorn ready.

ChicagoDave · 2026-02-25T19:34:16 1772048056

It’s like the world has given script kiddies a way to pwn themselves.

dgxyz · 2026-02-25T19:46:00 1772048760

Yep. Given me consultancy gigs until I retire cleaning up the disaster too.

chickensong · 2026-02-25T18:56:05 1772045765

You're not alone

AlienRobot · 2026-02-25T19:10:35 1772046635

I genuinely don't know anymore. Another user linked this https://www.tomshardware.com/tech-industry/artificial-intell... and the irony is at satire levels.

By the way, was that that movie a boy plays a game with an A.I. and the same A.I. starts a thermonuclear war or something like that? I think I watched the start when I was a kid but never really finished it.

ChicagoDave · 2026-02-25T19:33:08 1772047988

Yes. Watch it. Excellent movie.

daft_pink · 2026-02-25T22:39:09 1772059149

I wonder if a credit card permissions system like ramp would be good for allowing an agent to spend money, but limiting it’s permissions.

tonymet · 2026-02-25T19:13:00 1772046780

There are three ways to authorize agents that could work (1) scoped roles (2) PAM / entitlements or (3) transaction approval

The first two are common. With transaction approval the agent would operate on shadow pages / files and any writes would batch in a transaction pending owner approval.

For example, sending emails would batch up drafts and the owner would have to trigger the approval flow to send. Modifying files would copy on write and the owner would approve the overwrite. Updating social activity would queue the posts and the owner would approve the publish.

it's about the same amount of work as implementing undo or a tlog , it's not too complex and given that AI agents are 10000 faster than humans, the big companies should have this ready in a few days.

The problem with scoped roles and PAM is that no reasonable user can know the future and be smart about managing scoped access. But everyone is capable of reading a list of things to do and signing off on them.

throwpoaster · 2026-02-25T18:57:31 1772045851

OpenClaw running Opus is intelligent, careful, polite. It has a lot to do with the underlying model.

And if you don’t connect it to stuff, it can’t connect.

logicx24 · 2026-02-25T19:07:03 1772046423

But if I don't connect it to stuff, then what is it useful for?

throwpoaster · 2026-02-25T19:10:38 1772046638

As long as you’re careful, you can let it meat puppet you (go here do this).

You give it its own accounts, say email and calendar, and have it send you drafts and invite you to stuff. It doesn’t need your email and calendar.

Actually, I just asked my guy and he suggests just generating local ICS files. Even safer.

chaostheory · 2026-02-25T18:30:39 1772044239

Just treating it as an employee, would solve most of the problems I.e. it runs on its own machine with separate accounts for everything: email, git, etc…

luxuryballs · 2026-02-25T18:46:25 1772045185

makes me wonder if the metal it is running on is even a good enough sandbox, perhaps I should have it browse the web from a guest network isolated from other devices

stronglikedan · 2026-02-25T18:19:28 1772043568

TL;DR: sandboxes can't save you from anything if the sandbox contains your secrets and has access to the outside world. a tale as old as time and nothing new to agents specifically

TZubiri · 2026-02-25T18:33:42 1772044422

Oh ok, we'll add encryption then.

Checkmate atheists

gz09 · 2026-02-25T18:17:52 1772043472

Security models from SaaS companies based on having a bunch of random bytes/numbers with coarse-grained permissions, and valid for a very long time are already a bad idea. With agents, secrets/tokens really need to be minted with time-limited, scope-limited, OpenID/smart-contract based trust relationships so they will fare much better in this new world. Unfortunately, this is a struggle still for most major vendors (e.g., Github gh CLI still doesn't let you use Github Apps out-of-the box)

edf13 · 2026-02-25T18:16:57 1772043417

Agree, that’s why we’re building grith.ai

Sandboxing alone isn’t the right approach… a multi-faceted approach is what works.

What we’ve found that does work is automation on the approval process but only with very strong guards in place… approval fatigue is another growing problem - users simply clicking approve on all requests.

dmos62 · 2026-02-25T18:27:50 1772044070

Interesting. How are the security filters implemented?

edf13 · 2026-02-25T18:30:45 1772044245

Every system call, file access, net access etc is forced through a local “proxy” where 17 individual filters check what’s going on.

Everything is done locally via our grith cli tool.

Happy to answer any questions on hello@grith.ai too

imiric · 2026-02-25T18:44:07 1772045047

Was grift.ai too expensive?

edf13 · 2026-02-25T18:46:41 1772045201

https://grith.ai/blog/what-grith-means