I've been thinking about wiring up whisper[0], mozilla's tts[1] and gpt-3 together to make a voice assistant of sorts. Wouldn't have the access to device hardware and no guarantees of correct answers, but should blow siri etc out of the water in terms of understanding the context.
[0] https://github.com/openai/whisper [1] https://github.com/mozilla/TTS