I have a lot of respect for Rob Pike. Doing what he says guarantees correct operation in typical cases, when the memory is cacheable. If CPU arch doesn't support unaligned loads, the compiler must additionally be able to deduce pointer alignment or it's forced to generate separate loads.
However, if performance is important, doing what Pike says doesn't always make sense. The case discussed in the article in question is one those.
http://commandcenter.blogspot.com/2012/04/byte-order-fallacy...