Not just architectures, but different OSes and ABIs have found ways to repurpose no-ops. One example[1] is Windows using the 2-byte "MOV EDI, EDI" as a hot-patch point: it gets replaced by a "JMP $-5" instruction which jumps 5 bytes before the start of a function into a spot reserved for patching. That 5 bytes is enough to contain a full jump instruction that can then jump wherever you need it to.
## Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
Interesting, thanks for pointing this out! Just yesterday I was gazing at some program containing two consecutive xor rax, rax. I thought what’s the point? But as you point out it might be a NOP sled designed to be that specific length.
That would be surprising. xor is often used like that to set a register to 0, which is far from a nop. I'm not sure why it would do it twice, but it might be as simple as the compiler being stupid.
The fact that it’s xor rax, rax rather than xor eax, eax is also interesting as it’s one byte longer for exactly the same effect (modifying the bottom 32 bits of a register clears the upper 32 bits). It makes me think there’s something weird going on other than compiler stupidity. I’d be interested in seeing the code it was compiled from.
I wonder if this is still true. Whenever I go to hook Win32 API functions, I use an off-the-shelf length disassembler to create a trampoline with the first n bytes of instructions and a jmp back, and then just patch in a jmp to my hook, but if this hot-patch point exists it'd be a lot less painful since you can avoid basically all of that.
Though, I guess even if it was, it'd be silly to rely on it even on x86 only. Maybe it would still make for a nice fast-path? Dunno.
Intel Vtune will do this with 5-byte NOPs directly. I think LLVM's x-ray tracing suite did this with a much bigger NOP, also, to capture more information.
## Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=95...