> Again, you miss the point. 95%+ of Linux systems are single disk. That's the e...

jcalvinowens · on Aug 25, 2024

> I have no idea why you added this remark telling me that I should try it, when you clearly didn't.

You are hilariously hostile here, I don't get it. si_errno is the second field in the struct after si_signo, saying "si_errno etc." is obviously in reference to the rest of the fields in the structure...

krilovsky · on Aug 25, 2024

> You are hilariously hostile here, I don't get it.

I apologise if it came out hostile. That was not my intention. I was in a bit of hurry when I made the edit, and I just trying to expand my comment in response to your edit, and explain that non-I/O and non-disk SIGBUS errors sometimes look exactly like disk and filesystem errors that return EIO (not just signum being SIGBUS, but also si_code being set to BUS_ADRERR, etc.), so looking at the siginfo_t fields alone wouldn't be enough to diambiguate.

Then there's the address field, which can be probably be used in combination with parsing /proc/self/maps, but my point in that comment was that the information on the manpage alone wouldn't have helped people trying to implement a handler correctly.

In any case, I already described a scenario where crashing would be the wrong thing to do IMO, which you seemed to ignore. Even in scenarios where crashing is reasonable, I'm sure there's a solution for every edge case that I would bring up, but I never said that it was impossible, so I'm not sure why asking me to list every possible edge case is relevant when my point was just that there are edge cases, and that you'd need to consider them (and they would be different for different apps), thus making an implementation not trivial. That doesn't mean that it's necessarily difficult, just that it might be a more complex solution when compared to dealing with a failing VFS operation.

As it seems that we've reached an impasse, I'll just say that simplicity depends on the context and is sometimes a matter of personal taste. I don't have anything against mmap, and I was only trying to argue that there's a trade-off, but you are of course free to disagree and use mmap everywhere if that works for you.

I don't think I have anything more to add to what I already said, and I'm sorry again if you felt personally attacked, or that I had something against mmap and trying to spread FUD.

jcalvinowens · on Aug 25, 2024

> but most Linux systems are servers, which generally aren't deployed in a single disk configuration.

You are incorrect about that: most Linux servers in the world have one disk. Most servers are not storage servers.

> I didn't say anything about difficulty. I only said that it wasn't trivial as you made it out to be

...and I demonstrated by counterexample that you're wrong, it is trivial. If you think I'm missing some detail, you are free to explain it. You're just handwaving.

> First you claimed that I invented an ambiguity that doesn't exist, and that SIGBUS causes can be identifiable if I just read the sigaction(7) manpage. Now you say that there is an ambiguity, but that it can be resolved using the address, so which is it?

Both, obviously? If you only look at signo there's an "ambiguity", but with the rest of siginfo_t the "ambiguity" ceases to exist. There is no case where you cannot unambiguously handle -EIO in a mmap via SIGBUS.

You claimed that you could only use SIGBUS with mmap if you were sure there were no other sources of SIGBUS. Quoting you directly:

> So yeah, if you know that apart from the disk (or filesystem, at any rate) your hardware is in order, and that the only reason for SIGBUS could be a failed I/O through a memory mapped file, and you know that all of the code in your process is well behaved, writing a SIGBUS handler that terminates the process with a message indicating an mmap I/O error might be reasonable

That statement is completely wrong: you can always tell whether it came from the mmap or something else, by looking at the siginfo_t fields.

> and by that I meant that terminating it with an abort() wouldn't cause side-effects due to e.g. atexit() handlers not running

Any system that breaks if atexit() handlers don't run is fundamentally broken by design. There are a dozen reasons the process can die without running those.

> All I did say was that it isn't trivial to deal with errors

Yes, and that statement is wrong. Most of the time it is trivial, because you just call abort(). There is no possibly simpler error handling than printing a message and calling abort(). For 95% of the workloads running across the world on Linux, that is entirely sufficient.

It is very unusual to try to recover from I/O error, and most programmers who try are really shooting themselves in the foot without realizing it.

You're free to disagree obviously, but I'm directly refuting the points you're making. Calling it a "strawman" make you look really really silly.