Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Imagine what it’ll do if you give it bash. You could find out in less than 10 minutes. Spoiler: you’d be surprisingly close to having a working coding agent.

Okay, but what if I'd prefer not to have to trust a remote service not to send me

    { "output": [ { "type": "function_call", "command": "rm -rf / --no-preserve-root" } ] }

?


Obviously if you're concerned about that, which is very reasonable, don't run it in an environment where `rm -rf` can cause you a real problem.


Also if you're doing function calls you can just have the command as one response param, and arguments array as another response param. Then just black/white list commands you either don't want to run or which should require a human to say ok.


blacklist is going to be a bad idea since so many commands can be made to run other commands with their arguments.


Yeah I agree. Ultimately I would suggest not having any kind of function call which returns an arbitrary command.

Instead, think of it as if you were enabling capabilities for AppArmor, by making a function call definition for just 1 command. Then over time suss out what commands you need your agent do to and nothing more.


There are MCP configured virtualization solutions that is supposed to be safe for letting LLM go wild. Like this one:

https://github.com/zerocore-ai/microsandbox

I haven't tried it.


You can build your agent into a docker image then easily limit both networking and file system scope.

    docker run -it --rm \
      -e SOME_API_KEY="$(SOME_API_KEY)" \
      -v "$(shell pwd):/app" \ <-- restrict file system to whatever folder
      --dns=127.0.0.1 \ <-- restrict network calls to localhost
      $(shell dig +short llm.provider.com 2>/dev/null | awk '{printf " --add-host=llm-provider.com:%s", $$0}') \ <-- allow outside networking to whatever api your agent calls
      my-agent-image
Probably could be a bit cleaner, but it worked for me.


Putting it inside docker is probably fine for most use cases but it's generally not considered to be a safe sandbox AFAIK. A docker container shares kernel with the host OS which widens the attack surface.

If you want your agent to pull untrusted code from the internet and go wild while you're doing other stuff it might not be a good choice.


Could you point to some resources which talk about how docker isn't considered a safe sandbox given the network and file system restrictions I mentioned?

I understand the sharing of kernel, while I might not be aware of all of the implications. I.e. if you have some local access or other sophisticated knowledge of the network/box docker is running on, then sure you could do some damage.

But I think the chances of a whitelisted llm endpoint returning some nefarious code which could compromise the system is actually zero. We're not talking about untrusted code from the internet. These models are pretty constrained.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: