Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for digging that out. Yes, that makes sense to me as someone who made a fully local speech-2-speech prototype with Electron, including VAD and AEC. It was responsive but taxing. I had to use a mix of specialty models over onnx/wasm in the renderer and llama.cpp in the main process. One day, multimodal model will just do it all.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: