Yeah, I was actually thinking about this particular case for code specialization. In code where the inner loop is very branchy, you can have considerable gains for being able to remove unnecessary branches (and code).
This kind of technique was (is?) a fairly common in demoscene. Often just modifying constants in existing code but also specializing (AFAIK usually block concatenation) isn't unheard of.
(By the way, at least on x86, it might pay off to watch out for things like inner loop(s) branch target 16-byte alignment to avoid penalties.)
Only for size compos. Like 4 kB demos. Or 256 byte, the "new 4k".
For retro systems (like C64, speccy, Amiga OCS) speed is generally the king. Of course there are still sizecoding compos for them as well. My oooold Amiga demo effects were full of code generation and SMC.
This kind of technique was (is?) a fairly common in demoscene. Often just modifying constants in existing code but also specializing (AFAIK usually block concatenation) isn't unheard of.
(By the way, at least on x86, it might pay off to watch out for things like inner loop(s) branch target 16-byte alignment to avoid penalties.)