I expect that's because it was designed very well for the days of serial, in-order, static execution. But, CPUs today are all about pipelining and parallelism. Herb Sutter's "Free Lunch" has been over for longer than most people here have been writing software. http://www.gotw.ca/publications/concurrency-ddj.htm
Sutter's solution was to write more concurrent code, but that's not what helps in this case. Rather the CPU vendors have been throwing transistors at doing more math operations in parallel. All you need to do is make sure you're taking advantage of the vector instructions, whether that's done by the compiler or with intrinsic calls.
Traditionally the fastest CRC code has used lookup tables, I wonder if that's creating cache pressure these days? Or maybe that approach was abandoned while I wasn't paying attention.
> Rather the CPU vendors have been throwing transistors at doing more math operations in parallel. All you need to do is make sure you're taking advantage of the vector instructions, whether that's done by the compiler or with intrinsic calls.
Instruction-level parallelism is incredibly important. I'd say that any optimizing programmer needs to fully understand ILP, and how it interacts with pipelines and dependency cutting (and register renaming).
Modern CPUs are extremely well parallelized with ILP. Any good, modern hash function will take advantage of this feature of modern CPUs.
Case in point, it seems like xxhash is SCALAR 32-bit / 64-bit code. No vectorization as far as I can tell, its purely using ILP to get its speed.
Intel Assembly has a 64-bit multiplier (but vectorized only has 32-bit multipliers). I've theorized to myself that this 64-bit multiplier could lead to better mixing than the vectorized instructions, and it seems like xxhash goes for that.
The 32-bit version of xxhash can likely be vectorized and optimized even further.
https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Standa... suggests that Castagnoli has better "error detection capacity". What that means exactly (in terms of information theory) and whether that difference matters in practice is getting beyond my expertise level. In any case, existing formats like Zip are what they are, and as I mentioned in the blog post comments, it is still possible to SIMD-accelerate the IEEE flavor.