Not just architectures, but different OSes and ABIs have found ways to repurpose...

jeffbee · on July 14, 2023

Interesting, thanks for pointing this out! Just yesterday I was gazing at some program containing two consecutive xor rax, rax. I thought what’s the point? But as you point out it might be a NOP sled designed to be that specific length.

gizmo686 · on July 15, 2023

That would be surprising. xor is often used like that to set a register to 0, which is far from a nop. I'm not sure why it would do it twice, but it might be as simple as the compiler being stupid.

messe · on July 15, 2023

The second one is effectively a nop though.

The fact that it’s xor rax, rax rather than xor eax, eax is also interesting as it’s one byte longer for exactly the same effect (modifying the bottom 32 bits of a register clears the upper 32 bits). It makes me think there’s something weird going on other than compiler stupidity. I’d be interested in seeing the code it was compiled from.

jchw · on July 14, 2023

I wonder if this is still true. Whenever I go to hook Win32 API functions, I use an off-the-shelf length disassembler to create a trampoline with the first n bytes of instructions and a jmp back, and then just patch in a jmp to my hook, but if this hot-patch point exists it'd be a lot less painful since you can avoid basically all of that.

Though, I guess even if it was, it'd be silly to rely on it even on x86 only. Maybe it would still make for a nice fast-path? Dunno.

gcoakes · on July 14, 2023

Good read. Thank you.

This just worsens my fear of changing "unnecessary" code when I don't know the original motivation for it.

pclmulqdq · on July 14, 2023

Intel Vtune will do this with 5-byte NOPs directly. I think LLVM's x-ray tracing suite did this with a much bigger NOP, also, to capture more information.