Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can bump upwards without an overflow check even if the size is variable.

I’ve seen multiple ways to do it. And I’ve lost days of my life to coming up with pointlessly amusing variants that get there in weird ways.

Yeah I’m also amused that they didn’t get into the fact that downward bump is just not what the HW expects you to do. You’d need a real benchmark to see that effect. And, it’s an open question whether this would matter unless your allocation rate was obscene (if it isn’t then most scans of memory will be normal loops and not the allocator).



> fact that downward bump is just not what the HW expects you to do.

Where do you get this fact from? ARM and x86 have a descending stack (<= ARMv7 supported stack growing in either direction).


What does downward stack have to do with the cpu predicting that you’ll most likely scan memory in the forward direction?


Your stack is entirely data, why do you think a CPU couldn't predict access patterns on stack?

Prefetchers can predict negative offset/decrements since 1980s


Because stacks aren’t unbounded. You grow and shrink them. Therefore, regardless of which direction it grows in, it also shrinks in the opposite direction with the same likelihood.


And?

> fact that downward bump is just not what the HW expects you to do.

What fact is this? and what does it have to do with stack being bounded? (bounds of stack are unknown to most micro architectural state and only stored in OS structures).

HW prefetchers support access pattern in either direction, so do load/store instructions (stm(fd)/ldm(fd); ltp/stp;)


The fact that stacks are bounded means you both push and you pop. Just as frequently as a function calls another one resulting in memory ops moving to lower addresses, functions also return causing memory ops to move back to higher addresses. Pushes and pops are balanced. Why do we know they are balanced? Because the stack is bounded.

So - the direction of stack growth isn’t interesting to prefetching strategy.


Sure, but push/pop are not the only ways to access stack.

Moreover, I'm still wondering what you mean by this:

> Yeah I’m also amused that they didn’t get into the fact that downward bump is just not what the HW expects you to do.

What is this fact? Why does HW not expect you to bump addresses downwards?


Even when size is a full 64-bit value (…in that it might actually overflow the address space)?


Phil style: track end and remaining. End doesn’t change. To allocate size, check if lesseq remaining and subtract from it. End minus remaining is the result (using the old value of remaining). Has two variants - one that burns an extra register but saves an instruction. Empirically, this one is fast enough that I just use it all the time.

Bmalloc style bump: independently subtract from remaining and add to bump. It’s more instructions but they are instruction level parallel.

Another friend’s bump style: have bump/end ptrs like usual. Fast path checks if size < end - bump. Then you add to bump like usual.

Note that instruction counts on these don’t matter so much on modern CPU’s. But dependency chain length does matter, a lot. Register usage matters only if you want to inline the allocator (and it’s not always beneficial to do it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: