you've never fully implemented raft or paxos. it is well known that such distrib...

ideal0227 · on July 9, 2017

I wrote a raft implementation that is used by quite a few well known systems (etcd, cockroachdb, docker swarmkit, tikv and quite a few closed source systems). Probably it qualifies the "production ready" standard :P.

With that experience, I can say that a real world raft implementation is not easy and VERY time consuming.

However, preparing the coding interview is definitely a order of magnitude more difficult for me. It requires me to waste all my time on something meaningless, and makes me feel sick.

dis-sys · on July 9, 2017

must be etcd then. ;)

it took etcd about 2 years to become mature on its _single_ group raft implementation. the abstraction in raft.go is pretty good, the test suite is the best I can find, message passing and tick handling is correctly handled, but to be honest, everything else (entry management, transport, snapshot streaming, request handling etc) need to be rewritten/added to further scale it to support multi raft groups. that is actually what cockroachdb did.

etcd raft is great, just want to give my understanding on how time consuming it is to write a production ready raft library.

closeparen · on July 9, 2017

Okay, fair, we didn't do multiple Raft groups or membership changes. If I can find the code, I'll dust it off and give those test suites a try. The project's test suite had a 0MQ broker that could be programmed to selectively drop and delay messages while operating the KV store implemented by the student code. This is pretty similar to Jepsen, though Jepsen might have scenarios that the instructors didn't consider.

I didn't claim it was easy to implement tens of thousands of Raft groups among the same machines (why do you need that?) or to be resilient to Byzantine failures (if that's what you mean by hardware errors).

I don't think we are fundamentally disagreeing: you argue it's difficult to implement various extensions to Raft, we seem to agree it's not that hard to implement the core of Raft.

dis-sys · on July 9, 2017

I'd recommend to have a look at the etcd raft tests in the link below, you probably need to port them to work with your raft library.

https://github.com/coreos/etcd/tree/master/raft