Out of interest, why does this need a NOP? Why not replace an existing instruction with a branch, bounce down to the end of the code segment, put a longer sequence of code there, put the instruction that you replaced there, and then jump back?
Is it that the patch could be conditional and not incur as much of a performance penalty if the condition isn't met and the patch doesn't run? Would this have made a significant difference to performance?
Depending on the instruction, it might not be long enough for a branch. Then you'll have to replace two instructions, and if the program is executing (doing a live/hot patch) then you may have to be extra careful that code hasn't executed the first, but not yet the second instruction.
Is it that the patch could be conditional and not incur as much of a performance penalty if the condition isn't met and the patch doesn't run? Would this have made a significant difference to performance?