I'm a little biased (I used to work on the Google speech team), but it seems very hard for a startup to compete on the basis of accuracy, and for wearables like watches it's pretty clear both Google and Apple are putting third-party APIs for voice interfaces (including command-like syntaxes) front-and-center. A lot of earlier speech/NLP startups have struggled with this dynamic--although an aggressive, well-executing team can get a year or so ahead of the platform, if you do something too close its core competency, eventually Google/Apple will build the same feature directly into the operating system, and then you're stuck competing with a team of 100+ PhDs with a 1000x distribution advantage. At least, that's what would give me hesitation about building a speech/NLP API startup in 2014.
I also noticed you're running a conference on voice interfaces (http://listen.ai/). I'm not sure if you're well-connected to the speech folks at Google/Microsoft/Apple, but if you decide you want somebody from Google to speak, I'd be happy to ping some of my former colleagues on your behalf. Looking at the agenda, I think the areas where they could provide coverage is the core technology--acoustic modeling, deep learning, hotword detection, or embedded recognition.
We differentiate ourselves from the Android Speech API in several ways:
1) As a developer, with Google you have no way to customize the speech engine by providing your language model. Wit.ai builds a specific, customized configuration for each app. If your app is specific and you cannot tell Google what kind of thing it should expect, accuracy will be bad especially in noisy environments. Wit.ai builds a specific language model for each app automatically and in real time (each time Wit.ai learns more examples about your app, it's updated), and queries several speech engines in parallel. To do that it uses not only your data, but also relevant data from the community. This is the core of our value proposition and not something Google does provide today.
2) Google keep their Natural Language Understanding layer (what translates text to structured, actionable data) for themselves. Developers cannot access this. They're left with free text, but they often need actionable data.
3) Wit.ai is cross-platform. We have SDKs for iOS, Android, Linux, etc. [1] or you can just stream raw audio to our API. Android Speech API is just available on Android (well, you could hack it and use it from elsewhere but you're not supposed to, and you can be shut down anytime). More and more wearables and smart devices will run Linux. For instance hundreds of developers use Wit.ai on Raspberry Pi.
As for the Apple doc you linked, it's Mac only (no iOS) + it just recognizes a few phrases you provide in advance. I think it's a very old API that's still here :)
Regarding listen.ai, yes please we would love to have Google (especially Now) there. We have the Siri founder, the top Cortana guy, the former CEO of Nuance... but nobody from Google yet.
Having had RSI for a while I can't tell you how much I've wished for interfaces with point(/look) and speak UI. I literally haven't found a single case in which I couldn't quickly dream up a superior version of an existing UI.
Overall I came away with the conclusion that look-and-speak is probably the most deeply ingrained user interface there is. Perhaps the only one that you could argue is truly intuitive, as it seems to be genetically hard-wired.
On top of that modelling UIs as representation of hierarchical state machines is an astoundingly simple and elegant way to model such UIs; even allowing you to leverage persistent data structures to do amazing things. I've explore that to some degree in https://speakerdeck.com/mtrimpe/graphel-the-meaning-of-an-im...
Any chance listen.ai will either be livestreamed or videos made available later (a la confreaks or similar)? I can't make it, but I'm super interested in ALL of this and really, really want to learn.
I'm curious how you differentiate yourself from the built-in speech APIs on iOS and Android? https://developer.apple.com/library/mac/documentation/Cocoa/... http://developer.android.com/reference/android/speech/Speech... http://developer.android.com/reference/android/speech/Recogn...
I'm a little biased (I used to work on the Google speech team), but it seems very hard for a startup to compete on the basis of accuracy, and for wearables like watches it's pretty clear both Google and Apple are putting third-party APIs for voice interfaces (including command-like syntaxes) front-and-center. A lot of earlier speech/NLP startups have struggled with this dynamic--although an aggressive, well-executing team can get a year or so ahead of the platform, if you do something too close its core competency, eventually Google/Apple will build the same feature directly into the operating system, and then you're stuck competing with a team of 100+ PhDs with a 1000x distribution advantage. At least, that's what would give me hesitation about building a speech/NLP API startup in 2014.
I also noticed you're running a conference on voice interfaces (http://listen.ai/). I'm not sure if you're well-connected to the speech folks at Google/Microsoft/Apple, but if you decide you want somebody from Google to speak, I'd be happy to ping some of my former colleagues on your behalf. Looking at the agenda, I think the areas where they could provide coverage is the core technology--acoustic modeling, deep learning, hotword detection, or embedded recognition.