One thing to be aware of is that up/down alerting bakes downtime into the incide...

29athrowaway · on Aug 18, 2024

What you call pressure is often called saturation. Saturation means the resource is at 100% utilization.

But saturation is not the same as errors.

nrr · on Aug 18, 2024

I'm talking beyond saturation.

There are actually quite a few resources for which I'd like to maintain something resembling steady-state saturation, like CPU and RAM utilization. However, it's when I've overcommitted those resources (e.g., for RAM, no more cache pages that can't simply be purged to make more room for RSS) that I start to see problems. (Of note, if I start paging in and out too much, that can also affect task switching, which leaves the kernel doing way more work, which itself can lead to a fun cascade of problems.)

29athrowaway · on Aug 18, 2024

Saturation is not a boolean, it's how beyond 100% utilization the resource is.

https://www.brendangregg.com/usemethod.html