It's more high-level than it used to be but GPU is the main problem. The graphics APIs that games use usually expose various hardware specifics so games could be fine-tuned to get most out of that specific hardware. It takes a lot of work to translate these hardware-specific API calls into OpenGL/Direct3D/Vulkan/Metal.
It helps that many console devs don't bother with low-level optimisations, because they can get away with a bit worse performance. Just cap the framerate to 30 like most other console games do.
It wasn't an option during the NES era. Every clock cycle counted back then.
NES-era games also ran on bare metal. Modern consoles run proper modern operating systems with everything you'd expect to find in one — preemptive multitasking, virtual memory, hardware abstractions, kernel- and user-mode with syscalls, etc. PlayStation OS is a Unix system (something-BSD IIRC), Xbox's is obviously based on Windows NT, and Switch runs something Nintendo built from scratch.
Cycle-counting doesn't make much sense for code running in a preemptive-multitasking OS because no matter how much you count cycles, the actual execution time for a piece of code would be non-deterministic. That, and I suppose most modern games are bottlenecked by the GPU, not the CPU.
fwiw, the switch actually uses cooperative multitasking. This causes emulation headaches of its own, because sometimes retail games are buggy, but due to the relative determinism of scheduling nobody ever noticed (e.g. race conditions that are reliably won on real hardware will suddenly start breaking in emulation).