It's multiple gigabytes for a single language and accent. I wasn't even talking about the full dataset across languages.
You don't seem to understand the size requirements for getting a good dataset. Don't you think Microsoft would have loaded up the dataset if it was easy and cheap? They didn't have a desktop cloud-based recognition service until literally yesterday, so they had many, many years to include this magical dataset that solves all your problems without cannibalizing another one of its products. They didn't because it's not feasible right now. In the future? Maybe, hell, probably.
>It's multiple gigabytes for a single language and accent.
I have 602 GB free on my first hard drive, 519 free on my second, 699 on my third, 1.06TB free on my forth, 405GB free on my fifth and 46 free on my 6th.
If Microsoft would be kind enough to release it to me, I think I can probably find a corner to squeeze it into.
>Don't you think Microsoft would have loaded up the dataset if it was easy and cheap?
No, I don't. Microsoft wants our voice data, it's extremely valuable to them. They've figured out that there's gullible people like you who will swallow the "it can't be moved onto a local computer" tale hook, line and sinker, and thus give it to them for free.
> That dataset contains languages and accents that are not relevant to me.
You are assuming that they have a different model for each language and region, which I don't think is true since Cortana understand my foreign accent besides of being using USA as a region (Canadian version works really well too).
> I have 602 GB free on my first hard drive, 519 free on my second, 699 on my third, 1.06TB free on my forth, 405GB free on my fifth and 46 free on my 6th.
Good for you, but I don't have that many free space. Gee, I only have 20Gb free on my laptop. I think you might be bias about your situation but not everyone has +1Tb of free space waiting to be used for a voice command.
You don't seem to understand the size requirements for getting a good dataset. Don't you think Microsoft would have loaded up the dataset if it was easy and cheap? They didn't have a desktop cloud-based recognition service until literally yesterday, so they had many, many years to include this magical dataset that solves all your problems without cannibalizing another one of its products. They didn't because it's not feasible right now. In the future? Maybe, hell, probably.