If your monolithic service OOMs, hits a large GC pause causing dependent requests to time out, locks a shared file descriptor, or a bunch of other things then the monolithic service as a whole can hit a fault or stall even if other threads/tasks are still executing. While classes of errors like OOMs go away when multiple processes are executing.
A monolith can also scale vertically with mechanisms to redeploy on fatal errors. If all starts failing, you may have a problem. But you can get the same problems with a microservices that is in the critical path
Networks could have unexpected delays, routing errors and other glitches. At least with a monolith you can often find a stacktrace for debugging. I have seen startups that have limited traceability and logging when using micro services.
When a small startup has to manage "scalable" K8s infrastructure in the cloud, distributed tracing and monitoring is often not prioritized when you are a team of 5 developers trying to find a product market fit.
I am not against microservices (I work with them daily) but you just trade one type of stability problem with another
Right I'm not advocating for one over the other, I was just explaining issues solved by microservices. Now instead of the OOM Killer taking your service down, you have a flaky NIC on another microservice box and now you need to figure out how to gracefully degrade.
I love working with microservices at the scale of $WORK, but we're Big Tech. I can't imagine why a 5 person startup would want k8s and microservices. You don't need that scale until you have more than 2 teams, and you're pushing at the very least 15 engineers at that point and usually the sales and marketing staff to make that investment worth it.
I don't think it was well expressed, but to reuse my last example: OOM-killer ending the recommendations process mid-request is less of a big deal if the main store server can keep running and serving traffic.
If the recommendations team write code that causes the OOM-killer to end their process, making them run it on separate infrastructure insulates your "main store team" from the bugs they write.
It was about the OOM killer as the sibling comment says, yeah. I'm surprised you're so incredulous. OOM Killer and GC stalls are some things I've run up against in my career frequently. I'm sorry my comment didn't live up to your expectations, it was hastily typed on mobile.
His point was that the comment was unclear if you'd also read it hastily :-)
I imagine his logic was something like: "How can OOMs happen less often if you run more processes (possibly on the same machine)?", while your comment actually wants to say: "if a specific service is affected by an OOM, with microservices only that specific microservice goes down, since it's probably running on its own hardware".