Take in mind that executable code on Linux is mapped in from disk, either from a...

mpol · on June 16, 2024

Is this really from real world experience? And is that only in certain conditions?

My experience is really the opposite from this. Thrashing was a normal occurrence on my desktop, where it went into non-recovery and a manual hard reset.

And also, with 16GB ram and 4GB swap, my running applications got moved to swap. Switching tabs in Firefox will be slow because it has to come from swap. My swappiness was set to 1 that it shouldn't swap, but it did swap always.

Now without swap and using early_oom everything is fine. When I see in /proc/vmstat that there has been a kill, it is time to reboot.

On my laptop though, my usecase is different. It only has 2GB ram, so I prefer swap over a hard kill. And I reboot it more than once a day if I am using it.

blueflow · on June 16, 2024

Yes, i learned it the hard way when debugging production outages. Gitlab's Praefact recommended VM sizes were too small for our usecase and we had, per provisioning defaults, no swap on all machines. 150 MB of binaries in virtual memory, only 50 MB disk cache left, this is where it made click for me.

If you want a hard OOM kill, i don't know. I'm only talking about the I/O lockup that happens in these situations.

mpol · on June 16, 2024

Thank you. So ram was quite minimal, just barely enough to run the applications, and almost none left for disk io. On my laptop that is the same situation. On my desktop however, I have way more ram than needed to run the applications. So I assume it is dependent on the situation if you want (need) swap or not.

wbkang · on June 16, 2024

This is the correct answer that needs to be at the top. No swap doesn't mean OOM killer magically kicks in earlier. It just means the anonymous memory has no where to go and your executable pages get evicted and then you are really hosed.

more_corn · on June 16, 2024

And the machine crashes which in production environments is far preferable to dog slow.

blueflow · on June 16, 2024

Unfortunately no crash. This is the dog slow case. Too slow for an SSH session to be able to start. But the machine might catch itself and get back onto tracks without an OOM happening.

lokar · on June 16, 2024

If you turn off swap (which I do for large fleets of highly uniform and tightly managed systems) you should also mlock your executable pages.

blueflow · on June 16, 2024

I went with enabling swap and monitoring for page pressure. In the end of the day the disk cache for the application data is also highly performance critical.

blueflow · on June 16, 2024

How the lock-up looks in practice: RAM is mostly full with heap/stacks, there are a few MB available for disk cache and all processes fight each other to have their own code mapped into the remaining MB. Reading disk I/O is fully saturated at this point.