Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You know what would be an interesting service:

Chaos monkey/failure injection-as-a-service: in that you define the parameters by which you wanted to be assessed...

King of like pen test contractors...

So OK let me spin up an environ and attack the fuck out of it. Show me where im weak. So that in prod... im good.



You mean like https://www.gremlin.com/ ?

One of the founders of Gremlin is an Engineer that worked in Netflix and probably worked on Chaos Monkey as well :)


If I understand correctly, one of the limitations of doing application level failure injection with Gremlin is that you need to integrate it into your code: https://www.gremlin.com/docs/application-layer/installation/

It might be interesting to combine these approaches and use a traffic split to send a percentage of traffic to Gremlin instead of integrating into the code directly.


Shh, you're going to wake up the TinkerPop Gremlin guy.


These kinds of errors are going to happen in production, whether you inject them or let them occur naturally. Any release process that doesn't go perfectly with a drain / rebalance / start new version / rebalance per (backend, proxy) combo is going to have a timeout or broken connection between the proxy and the backend as it restarts. Should you return 502 to your users when that happens? Nope, just retry on a different backend. This lets you test that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: