That's only a problem with one of the arguments, and is exactly why there are two approaches. Some errors are hard to detect by reading or working on the code, but easier to detect from behavior. Some are the other way around.
In this case, I do not agree that this could have gone unnoticed for a long time, even only looking at the behavior of the running system. When multiplied by the number of installations and how much filesystems are used, it is run a lot. It's also the kind of thing people investigate when it happens in certain settings - we lost a block and our whole filesystem was corrupted. And the kind of thing people test for.
In this case, I do not agree that this could have gone unnoticed for a long time, even only looking at the behavior of the running system. When multiplied by the number of installations and how much filesystems are used, it is run a lot. It's also the kind of thing people investigate when it happens in certain settings - we lost a block and our whole filesystem was corrupted. And the kind of thing people test for.