It's much easier to listen for a single word than to do the rest of the voice-recognition tasks. It would be a huge waste to upload all of the audio all the time, so usually these systems do the one-word thing on the device. They have a rolling buffer of a few seconds so that when it detects that hotword, it can send that to the cloud. It helps with noise removal. But not everything.