The fact they can't capitalize on the current trainwreck of GitHub speaks volume...

u_fucking_dork · 2026-05-11T22:43:01 1778539381

Brother there’s nothing to capitalize on. They really don’t want an avalanche of free users bringing their shit down too I think.

PunchyHamster · 2026-05-11T22:44:50 1778539490

companies using paid github are in same spot. Tho I'd imagine many already moved over

tokioyoyo · 2026-05-12T03:16:05 1778555765

Most companies signing up to the idea that GitHub will fix their issues, rather than going through operational pain of migration. Everyone that I know jokes about GH downtime, but have zero internal talks about migration. Obviously small data point, but GitLab going this route shows not a lot of people are switching.

Game_Ender · 2026-05-11T23:02:24 1778540544

Unless you pay for enterprise then you are on the Enterprise Cloud Instance: https://us.githubstatus.com/posts/dashboard

Marsymars · 2026-05-11T23:19:02 1778541542

I've never actually seen that status page before, and I'm not clear what it's measuring. My company pays for Enterprise Cloud, and we see all the same downtime as what gets posted to https://www.githubstatus.com/

semiquaver · 2026-05-11T23:52:08 1778543528

That is the ghe.com status page, which it seems like no one actually uses, hence the good uptime. Most Enterprise Cloud customers don’t use it.

https://docs.github.com/en/enterprise-cloud@latest/admin/dat...

everfrustrated · 2026-05-12T16:12:46 1778602366

No the Enterprise Cloud is just the same GitHub.com with the same shitty reliability.

However the newer Enterprise Cloud with Residency (aka on Azure) is a separate partition and has a different reliability domain (still subject to Azures bad reliability so not an entirely compelling offer). This is what you linked.

KaiserPro · 2026-05-12T08:54:07 1778576047

Gitlab used to be about as reliable as github. (ignoring the security oopses they used to have)

They simply don't have (or didnt) the skills to scale. THey were talking about using ceph to run things (which gives you an idea about how green their infra team was)

darkwater · 2026-05-12T10:36:07 1778582167

Are you implying they should create more in-house solutions, or that specifically Ceph is not a good solution and there is some other 3rd party solution that could be used instead?

oblio · 2026-05-12T09:38:35 1778578715

What's wrong with Ceph?

KaiserPro · 2026-05-12T10:51:18 1778583078

Whats right with it?

Its slow, large, excessively complex and not that resilient to failure.

You either want a bunch of NFS machines backed on to ZFS on nvme, with a central jumping off point that allows sharding (this is critical to allow one or more NFS server to fuck up and not kill access to everything else.)

Or, pay the money and use GPFS

antongribok · 2026-05-12T12:11:41 1778587901

As someone who's in charge of close to an exabyte on Ceph, I couldn't disagree with you more.

Done correctly, Ceph is extremely reliable, resilient, and fast. Once you get over the initial learning curve, dare I say, even a joy to work with.

KaiserPro · 2026-05-12T19:48:07 1778615287

For parallel read/write access across many thousands of large-ish files (ie multiples of the minimum chunk size) I'm sure it does grand.

But for metadata heavy operations, ie git, its not the FS I would choose. like lustre it can be fast, if your workload aligns with it's tradeoffs. but high metadata loads are not ceph-fs's strong point, (or many other distributed filesystems either)

arpa · 2026-05-12T12:54:10 1778590450

I concur, even though I have only used it as a hobbyist.

__turbobrew__ · 2026-05-12T16:12:56 1778602376

You are calling infra teams green for using ceph but then recommending hand rolling a sharded system on top of nfs? I think you have it backwards.

KaiserPro · 2026-05-12T18:40:49 1778611249

Its a pattern that works well in VFX, It has the advantage over something like isilon in that hotspots are isolated to individual servers, not across the namespace. So if one of your git stores is being hammered, you can migrate hot/cold repos to other servers fairly simply. Also if one of the server dies, it has a limited blast radius.

The problem with things like ceph-fs (and lustre and to a lesser extent, GPFS, although its not entirely comparable) is that the metadata store is your weak link. Ceph scales great if you have loads of large files where you're read/writing in parallel. (ie pulling thousands of PAR files or images, videos or binaries) it scale almost linearly with the number of object stores. It also works well when your writing to the middle of a file. (far fucking better than s3 like systems)

git is monster metadata eater. Everything git wise is a metadata lookup. That means that when you are running thousands of concurrent git ops on a distributed filesystem, your object throughput will fall off the floor. so you could have 100 ODSs all on 100 gig network with massive nvme stripes, but your global throughput will be shite because your MDS is the limiting factor. You can add more metadata servers, but then ceph is choosing how to shard, not you.

either way, deleting a large git repo, then all your metadata operations start crawling.

This means that you need to think about doing optimisations like keeping git inside a tar or some other container container that are pulled out, loaded in ram, operated on, and forced back as a binary blob. the result means that your thousands of metadata ops are reduced to two or three, and your back to being network bound.

So yeah, no I stand by my original position.

mh- · 2026-05-11T23:42:01 1778542921

Yeah, but they could make it up in volume.

bmitch3020 · 2026-05-12T10:33:43 1778582023

I'm not sure there's a lot to capitalize on, considering the state of hosting OSS development. But this really is a case study on watching your biggest competitor face plant into a wall, and responding by breaking into a head first sprint.