It's easy to get into situations where you're paying massive costs with serializ...

anewhnaccount2 · on Jan 22, 2021

Generally memory layout is extremely important for graph problems, even on a single node. As I understand it the Spark approach does not embrace a "flat" layout, but rather does lots of pointer chasing, which can really slow things down. Because Spark isn't very careful about memory usage and layout, you outgrow a single node quite fast, and then you're back to really bad distributed scaling characteristics.