That is a good point. The distance computation and displacement checking indeed could be changed to use the squared forms. However, I was unable to remove the square rooting because of their use in the core math.[1]
I don't think there's a way to change this math to work without taking the square root to get `distance`. It's possible that there is a way to change that to work with the squared forms, but I don't grasp the actual math going on well enough to determine that unfortunately.
Looking at the perf metrics for the full function's disassembly[2], square root-related instructions only account for a tiny amount of the program's runtime anyway.
That all being said, thank you so much for taking the time to read the post closely enough to come up with this idea!
I'm not entirely clear on what's going on with gs and hs - but at least gs seem to only be used with an additional multiply with distance - but I don't think that immediately allows to simplify away the sqrt by squaring both sides of the divide in the definition?
let gs =
2. * weight * (distance - ideal_distance) /
(ideal_distance_squared * distance);
let distance_cubed =
distance_squared * distance;
unsafe {
*self.g.get_unchecked_mut(
i * n + u) += distance * gs };
}
in the same spirit of the above comment, the webcola core seems to be doing O(n^2) comparisons. probably somewhere in the top 10 for cool foundational scientific computing algorithms that people like here are barnes hut, which cuts the comparisons to O(n log n), and fancier, fast multipole. basically, no need to precisely compare far apart things, which are most of them. by the time of something like interactive UIs over 100K+ nodes, the asymptotics costs start dominating everything, and 2-8X C/SIMD speedups get overwhelmed if you don't scale the core alg. this is at the heart of many modern ML algorithms as well: many need to iteratively compare vectors, such as knn and umap.
pretty neat as well is that barnes hut + multipole are also SIMD friendly, and even better, GPU friendly. it is a bit trickier b/c now dealing with ideas like SIMD/GPU tree construction, but still works! for graphistry, our engine prototype had a multicore wasm frontend impl, gpu webcl frontend impl, and gpu opencl backend impl, and just by adding that, got some pretty cool #'s. nowdays we do a bit wilder and focus more on other layers like interactive analytics, bigger-than-memory, connectors/collaboration/embedding/etc, but all are logical steps building on what is being described here. bravo!
edit: it's good the blogpost included rendering. rendering gets pretty fun with graphs b/c while points are ~easy, edges get at (a) dynamic geometry (b) that can span the screen and (c) blow up in count pretty quick. ouch! the pathfinder project for vector text rendering is a fun related effort for what it takes to speed that up with simd/gpu. if someone is interested in a webgl contract around helping our users explore bigger interactive graphs (webgl/react client <-apache arrow-> python/CUDA server), ping build@graphistry.com !
There are components of it like this:
I don't think there's a way to change this math to work without taking the square root to get `distance`. It's possible that there is a way to change that to work with the squared forms, but I don't grasp the actual math going on well enough to determine that unfortunately.Looking at the perf metrics for the full function's disassembly[2], square root-related instructions only account for a tiny amount of the program's runtime anyway.
That all being said, thank you so much for taking the time to read the post closely enough to come up with this idea!
[1] https://github.com/Ameobea/webcola-wasm/blob/master/src/wasm... [2] https://ameo.link/u/91u.html