Now, bring on those multimodal LLMs with voice input and output please!
Some backends allow tool calling.
Now, bring on those multimodal LLMs with voice input and output please!