Speaking as someone who's deaf and uses these services a lot: for speech to text, the AI stuff is getting rather good.
I'm not saying it's perfect for every situation, but I have a very high success rate using InnoCaption[0] for captioned phone calls, including to places like restaurants with a lot of noise going on in the background. InnoCaption does both live person and AI-based captioning; since they started offering the AI-based option I've left that on, and I've never had to switch to human operators to continue a conversation.
That said - I'm not deaf from birth (lost my hearing in elementary school), so I voice for myself and that does simplify the process. I have used the old school text-only relay services and that was always such a miserable experience for me that I would crawl over broken glass to avoid making phone calls, especially going through phone trees. That's one area that relay operators still have a major advantage on. IIRC, Google's Pixel phones are supposed to be able to navigate phone trees for you, but since I use iOS I have no personal experience there.
I can't really understand speech these days without the captions to go with it. But I encounter discrepancies with AI generated captions very often. As in, I heard something and from context I know I'm right and the AI is wrong. With Whisper and other deep learning based speech systems in particular - they can generate very plausible misinterpretations - sounds similar and is grammatically plausible - but not what was said. Of a kind that a person with semantic understanding of what's going on would not make. So I am a little leery of them for that reason. I rely on it every day for generating captioning to video and so on. I don't find any iteration I've tried reliable or comfortable for interactive use.
> I encounter discrepancies with AI generated captions very often. As in, I heard something and from context I know I'm right and the AI is wrong.
I've been noticing this as well. It's becoming a common problem. Also, many times I've noticed that if I hadn't heard the speech being captioned and only had the captioning to go by, I would have had little chance of correctly understanding what was actually said.
[Applause] on YouTube transcripts, short two or three syllable sentence fragments, and absolute nonsense are the only ones I’d be able to reliably detect sans audio. But doubt YouTube captions are state of the art given how poor it is.
The phone tree stuff on Pixel is decent but nowhere near 100% reliable or robust.
If it hears and understands an automated system speaking out a phone tree, it will start to list the options and you can tap on them. Usually works but often doesn't recognize that a phone tree is happening. Other times it recognizes the phone tree, but mistranscribes the options.
As a non-deaf person, it's a handy UX improvement. But I wouldn't recommend that anyone rely on it.
These services are indeed great for those that need them. I received one or two years ago when I worked at a computer shop. Unfortunately they were always scammers, abusing the system.
I'm not saying it's perfect for every situation, but I have a very high success rate using InnoCaption[0] for captioned phone calls, including to places like restaurants with a lot of noise going on in the background. InnoCaption does both live person and AI-based captioning; since they started offering the AI-based option I've left that on, and I've never had to switch to human operators to continue a conversation.
That said - I'm not deaf from birth (lost my hearing in elementary school), so I voice for myself and that does simplify the process. I have used the old school text-only relay services and that was always such a miserable experience for me that I would crawl over broken glass to avoid making phone calls, especially going through phone trees. That's one area that relay operators still have a major advantage on. IIRC, Google's Pixel phones are supposed to be able to navigate phone trees for you, but since I use iOS I have no personal experience there.
[0] https://www.innocaption.com/