It's not "executed" per se. It consumes space in the cache hierarchy, and a slot...

It's not "executed" per se. It consumes space in the cache hierarchy, and a slot in the front-end decoder. It won't ever be issued, but depending on the microarchitecture in question it might result in an issue cycle having less occupancy than it might have had in the case where the subsequent instruction was available.

With that said, the first few instructions of a called function often stall due to stack pointer dependencies, etc. so the true execution cost is likely to be even smaller than the above might suggest.