REP MOVS is still a microcode loop, but it will copy entire cachelines (usually 64 bytes) at once if it can. The fact that it is a tiny instruction (2 bytes) and runs in microcode means that it doesn't consume instruction fetch bandwidth while it's running, and occupies only a tiny amount of the instruction cache.