A nitpick, but my name is Orson Peters, Peters is my last name :)
> I assume that's one of the advantages of building upon powersort? The blocky nature of quadsort is both a blessing and a curse in that regard.
I don't know, I only use a binary search when splitting up merges, and almost no time is spent in this routine. I use this for the low-memory case, as well as to create more parallelism to use for instruction-level parallelism.
> As for instruction-level parallelism, I do think it's actually almost entirely memory-level parallelism. I could be wrong though.
I didn't do specific research into which effect it is, when I say ILP I also mean the memory effects of that.
I actually knew that, not sure what went wrong in my brain.
>I don't know, I only use a binary search when splitting up merges, and almost no time is spent in this routine.
You'll still get a definite speed-up, worth benching the difference just in case. I guess there's no easy way to avoid a binary search.
>I didn't do specific research into which effect it is, when I say ILP I also mean the memory effects of that.
They're indeed related. I did some empirical testing and I'm quite sure it's cache related. Thinking about it, one issue I may have had was not benching sorting 100M elements, which might be where a bidirectional partition might benefit on my system.
A nitpick, but my name is Orson Peters, Peters is my last name :)
> I assume that's one of the advantages of building upon powersort? The blocky nature of quadsort is both a blessing and a curse in that regard.
I don't know, I only use a binary search when splitting up merges, and almost no time is spent in this routine. I use this for the low-memory case, as well as to create more parallelism to use for instruction-level parallelism.
> As for instruction-level parallelism, I do think it's actually almost entirely memory-level parallelism. I could be wrong though.
I didn't do specific research into which effect it is, when I say ILP I also mean the memory effects of that.