Congrats on the round! I'm curious how you differentiate yourself from the built...

ar7hur · on Oct 15, 2014

Thanks for these excellent remarks.

We differentiate ourselves from the Android Speech API in several ways:

1) As a developer, with Google you have no way to customize the speech engine by providing your language model. Wit.ai builds a specific, customized configuration for each app. If your app is specific and you cannot tell Google what kind of thing it should expect, accuracy will be bad especially in noisy environments. Wit.ai builds a specific language model for each app automatically and in real time (each time Wit.ai learns more examples about your app, it's updated), and queries several speech engines in parallel. To do that it uses not only your data, but also relevant data from the community. This is the core of our value proposition and not something Google does provide today.

2) Google keep their Natural Language Understanding layer (what translates text to structured, actionable data) for themselves. Developers cannot access this. They're left with free text, but they often need actionable data.

3) Wit.ai is cross-platform. We have SDKs for iOS, Android, Linux, etc. [1] or you can just stream raw audio to our API. Android Speech API is just available on Android (well, you could hack it and use it from elsewhere but you're not supposed to, and you can be shut down anytime). More and more wearables and smart devices will run Linux. For instance hundreds of developers use Wit.ai on Raspberry Pi.

As for the Apple doc you linked, it's Mac only (no iOS) + it just recognizes a few phrases you provide in advance. I think it's a very old API that's still here :)

Regarding listen.ai, yes please we would love to have Google (especially Now) there. We have the Siri founder, the top Cortana guy, the former CEO of Nuance... but nobody from Google yet.

[1] https://wit.ai/docs

mtrimpe · on Oct 15, 2014

Having had RSI for a while I can't tell you how much I've wished for interfaces with point(/look) and speak UI. I literally haven't found a single case in which I couldn't quickly dream up a superior version of an existing UI.

Overall I came away with the conclusion that look-and-speak is probably the most deeply ingrained user interface there is. Perhaps the only one that you could argue is truly intuitive, as it seems to be genetically hard-wired.

On top of that modelling UIs as representation of hierarchical state machines is an astoundingly simple and elegant way to model such UIs; even allowing you to leverage persistent data structures to do amazing things. I've explore that to some degree in https://speakerdeck.com/mtrimpe/graphel-the-meaning-of-an-im...

ckrailo · on Oct 15, 2014

Any chance listen.ai will either be livestreamed or videos made available later (a la confreaks or similar)? I can't make it, but I'm super interested in ALL of this and really, really want to learn.

_jt2r · on Oct 16, 2014

+1, also really interested.

brandonb · on Oct 15, 2014

Cool! Send me an email—the address is on my profile.

untog · on Oct 15, 2014

As far as I can see, either Speech API provides a backend NLP interface like Wit does. Wit lets me build my own data set, custom phrases, etc. etc.