> Jason Goldman, who served as Vice President of Product at Twitter between 2007 and 2010, responded to Weaver’s tweets with the observation that early Twitter was “held together by sheer force of will.”
I would dispute that, I don't think they can take that much credit. Regardless of their "sheer force of will" the site was down very, very frequently.
Much of that was just bad design decisions in the early days creating a momentum that was unstoppable and unreplacable at the speed we were growing.
In the early days they implemented a Kitama system for memcache. It worked great so long as nothing failed since a node coming in and going out could lead to bad results. In the old days they would just flush caches when that happened. Later, when memcache was the only way we could serve the loads we had it became imperative that memcache never be restarted. A single memcache restarting would overload the mysql backend so bad that the site would be down for hours. Adding memcache had more or less the same issues though it was a little easier to prewarm things. We wrote a kernel module that allowed us to change the ulimits of a running process so we could increase the file descriptor limits for memcache without restarting it.. Replacing it completely was damn near impossible given the growth rate and inability to get the data out of mysql quickly enough.
Us reliability guys worked 16 hour days 7 days a week for years trying to keep things working well enough to not fail completely. Some of the crazy hacks we did just to survive were fantastic and impressive and I am still proud and disgusted by them to this day. =)
I would dispute that, I don't think they can take that much credit. Regardless of their "sheer force of will" the site was down very, very frequently.