I need to add two factoids to this insightful discussion: There is a big differe...

jandrewrogers · on Aug 11, 2012

While there is a distinction between space-filling curves and what you are calling "hyperspace hashing", they are theoretical siblings and the conceptual gap between the two is very small, arguably differing interpretations of the same abstract construct. The "hash" interpretation was well-known when the space-filling curve indexing literature in the late 1980s was written though they were primarily interested in access methods for sequential storage.

With respect to "hyperspace hashing", it has been the standard way of interacting with large space decompositions for decades. The mathematics is from the 1960s (Morton et al) and the practical computer science was mature by the 1980s. When I was writing parallel codes for supercomputers, virtually all of our data structures were built this way. It is also the algorithm used for some large-scale geospatial systems; some of the Google Earth backends used it back in the day. While I've never personally worked on it, I've heard some graphics processing pipelines also use the algorithm. I first learned about it ten years ago because I ran into a couple systems using it.

I've never heard it called "hashing" until recently, largely because technically the algorithms are not a hash even if used as one in many implementations. This, and the fact that most of the literature on this idea significantly pre-dates electronic publishing, probably explains why you've seen so few references to hyper-dimensional hashing.

As you point out, the curse of dimensionality is a classic problem for these structures. The partitioning technique in the paper works for some use cases but is not used as much anymore. There have been some clever and fairly general attacks on the dimensionality problem over the last few years by a couple companies but I have not seen anything published.

All that said, this general approach is overdue for large-scale distributed systems. It has been proven out in niche markets like supercomputing for decades and is superior to distributed hash tables for many purposes. However, most of the recent research does not seem to be published.