Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd argue you're starting from a few orders of magnitude more competency than the credit union was. Their non-banking site was hosted by some podunk company in Texas with no sense of redundancy anywhere. Their provider had a near total networking outage and the credit union had no plan to recover from that.

Insofar as proactively monitoring a single /24, you (probably) don't. I don't think it's (usually) a company's job to monitor their customer's ISPs. The failures that "my" credit union had were due to their own choice in infra (Armor, Cloudflare). When Sonic nuked my config on their DSLAM after some maintenance I raised an issue with Sonic not with whatever other companies became inaccessible as a result.

> Does that count as an outage?

My POV may very well differ from whatever contracts and SLAs you have in place, but yeah maybe. If you can't fail over to the alternative ISP then yes that's an outage. Of course a trans-atlantic fiber break would also likely be a lot more noticeable than fat fingering a route for a /24. And sure, I've been stuck at megacorp when the VPN started handing out addresses in a new subnet but our department's networking team hadn't caught up. That's why you listen to your customers instead of throwing out a "someone else screwed up there's nothing we can do" response.

Me personally I don't think that a 20 minute banking outage is a massive problem (I've long since moved my money elsewhere), even the 20 hour outage was relatively minor. It just speaks to the unwillingness of the credit union to be highly available. They knew of the Armor outage and didn't actually test the remediation. I assume they didn't know about the Cloudflare outage. Both worry me. What happens when they're faced with a total failure of their online banking system?



But it isn't an outage. My monitoring point in Singapore could reach both ends, they just couldn't talk to each over, due to a routing issue on a third party network over the internet.

On my own network which I control I accept that if a circuit breaks I'll have a 1, maybe 2 second outage while traffic reroutes. For some of my services that's would be a problem, for others it's not. If facebook loads 2 seconds later, nobody cares. If the winning penalty in the world cup final blacks out, that's a big problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: