I agree with a lot of the statements at the top of the article, but some of them...

kqr · 2025-10-14T06:46:13 1760424373

> you really need as much breadth as you can get

Sure, more is always better. Practically, though, we are trading depth for breadth. In my experience, many problems that look dissimilar after a shallow analysis turn out to be caused by the same thing when analysed in depth. In that case, it is more economical to analyse fewer incidents in greater depth and actually find their common factors, rather than make a shallow pass over many incidents and continue to paper over symptoms of the undiscovered deeper problem.

exmadscientist · 2025-10-14T15:11:02 1760454662

I guess that's where our experience differs. I am 100% on board with you for chasing as many incidents to true root cause as possible and I agree that doing so can be extremely helpful in ways that might not be easy to foresee.

But my experience is also that you cannot ignore anything. Even the little stuff. The number of difficult system-level bugs I have resolved by remembering "you know, two weeks ago, it briefly did this weird thing that really shouldn't have been possible, but this might be related to that if only..." is crazy. It's been a superpower for me through the years.

However, I mostly work on hardware. Hardware's complexity envelope is straight-up different to software's. So that might explain some of our different perspective. Hardware absolutely never randomly misbehaves (which is to say that all its bad behavior has some kind of cause one might reasonably be able to ascertain, and that cause isn't from a level unreachable for a hardware engineer), but software carries enough state, and state from other levels of the stack, that I would not make the same statement. Thus the fault-chasing priorities aren't quite the same.