This it is very impressive. But scrolling through the preprint, I wouldn't call any of it elegant.
I'm not blaming the model here, but Python is much easier to read and more universal than math notation in most cases (especially for whatever's going on at the bottom of page four). I guess I'll have one translate the PDF.
I just tested it on a very difficult Raven matrix, that the old version of DeepThink, as well as GPT 5.2 Pro, Claude Opus 4.6, and pretty much every other model failed at.
This version of DeepSeek got it first try. Thinking time was 2 or 3 minutes.
The visual reasoning of this class of Gemini models is incredibly impressive.
2. The interactions were often strained. Not every edit/change is easy to articulate with your voice.
If 1 had been our only problem, we might have had a hit. In reality, I think optimizing model errors allowed us to ignore some fundamental awkwardness in the experience. We've tried to rectify this with v2 by putting less emphasis on streaming for every interaction and less emphasis on commands, replacing it with context.
I'm a paying customer and I signed onto Aqua Voice shortly after your demo on HN.
My experience with it has been overall positive but mixed. I enjoy using it for dictation, but I found that issuing editing commands and having them recognized/executed often took a lot longer than making an edit myself (which I can't do while in dictation mode).
But as a paying customer, seeing you go in this direction is somewhat sad/frustrating. You're abandoning the product I use, and you're saying that if I want to see my platform supported, I or someone from the community has to provide it- for a fully proprietary paid application.
I understand that I'm a minority user, but it's a bit disappointing to read this.
Totally understand, thanks for being a customer. I'm sorry we weren't able to make the web version as smooth as we wanted to.
We do plan to support Linux. This was probably a little bit of a blind spot for us - not realizing that anyone running a Linux desktop doesn't even have system voice tool to fall back on.
I share the same sentiment. I remember thinking in college how annoying it was that I was reading low-resolution, marked-up, skewed, b&w scans of a book using Adobe Acrobat while CS concentrators were doing everything in VS Code (then brand new).
but we do think voice is actually great with Cursor. It’s also really useful in the terminal for certain things. Checking out or creating branches, for example.
I was excited to try this out because I've had a lot of trouble getting the Supabase integrations to work on Lovable and Bolt.new.
Sorry to say that Firebase Studio did an awful job. It did not successfully build even the first view of the app I asked for. It feels like I'm stepping back to release day of GPT-4.
Am I missing a switch to use the good Gemini 2.5 somewhere? I could tell from their response speed that I was not using a thinking model.
I'm not blaming the model here, but Python is much easier to read and more universal than math notation in most cases (especially for whatever's going on at the bottom of page four). I guess I'll have one translate the PDF.