bourbonproof's comments

bourbonproof · 2025-10-07T00:16:50 1759796210

Honestly, Docker Swarm is so great. We use it with 6 very beefy machines (each 1tb memory/96cpu cores) for years already. It's so stable and well done, no restarts, crashes, or weird behaviors. We use it in a Hetzner VLAN and performance is excellent. We are very satistfied with it, and I would use it for even bigger scenarios. I even thought about building something like coolify/flightcontrol on top of it, so I can really easy have proper deployments of my stuff

bourbonproof · 2025-09-22T22:49:47 1758581387

Do I understand this right: if these 3 nodes shutdown for some reason, all data is lost and you have to actually restore from backup instead of just starting the machine again. And even if you have to restart one node (due to updates, or crashes) you also have to restore from backup? If so, why not pick a hosting provider that doesn't wipe the disk when machine shuts down?

mattrobenolt · 2025-09-22T23:54:23 1758585263

It's more than just shutting down. You'd have to have an actual failure. Data isn't lost on a simple restart. It'd require 3 nodes to die in 3 different AZs.

While that's not impossible, the reality is that's very low.

So simply restarting nodes wouldn't trigger restoring from backup, but yes, in our case, replacing nodes entirely does require that node to restore from a backup/WALs and catch back up in replication.

EBS doesn't entirely just solve this, you still have failures and still need/want to restore from backups. This is built into our product as a fundamental feature. It's transparent to users, but the upside is that restoring from backups and creating backups is tested every day multiple times per day for a database. We aren't afraid of restoring from backups and replacing nodes by choice or by failure. It's the same to us.

We do all of the same operations already on EBS. This magic is what enables us to be able to use NVMe's since we treat EBS as ephemeral already.