Requiring the caller to put all arguments on the stack isn't "zero cost." For a non-variadic call on ARM64, the first eight parameters (or more, if some are floats) will be passed in registers without ever touching the stack.
On x86-64, the caller also has to set %al to the number of vector registers used for the call, and the compilers I've seen always check %al and conditionally save those registers as part of the function prologue. Cheap, but not "zero cost."
You'll notice how `normal` takes all of its arguments out of registers `x0` through `x7` and places them on the stack for the call to `printf`. And you'll notice how `vararg` plays a bunch of games with the stack and never touches registers `x1` through `x7`. (It still uses `x0` because the first argument is not variadic.)
On the caller side, observe how `call_normal` places its values into `x0` through `x7` sequentially and then invokes the target function, while `call_vararg` places one value into `x0` and places everything else on the stack.
So, no, it looks to me like varargs very much change the calling convention.
Citing the bastard architecture of iOS isn't really making the case for "usually".