Started as Pay-per-use service (upload the file and receive an email with transcript). Now we also offer subscription based service for media library, AI summaries, chat Q&A, insights/highlights.
I use a two-pass approach - first pass with ASR (OpenAI Whisper) and second pass with an LLM.
I ask users to provide context upfront and use that as the "initial_prompt" parameter in Whisper: https://github.com/openai/whisper/discussions/963#discussion...
Gemini might have similar capabilities for custom vocabulary, though I'm not certain about their specific implementation. The two-pass ASR+LLM approach could work with Gemini's output as well.
Working on https://videotobe.com a audio/video transcription service.
VideoToBe started as a user friendly Whisper wrapper — but is evolving into a full pipeline that extracts, summarizes, and structures insights from multimedia content.
Claude code is more user friendly than cursor with its CLI like interface. The file modifications are easy to view and it automatically runs psql, cd, ls , grep command. Output of the commands is shown in more user friendly fashion. Agents and MCPs are easy to organized and used.
I feel just the opposite. I think Cursor's output is actually in the realm of "beautiful." It's well formatted and shows the user snippets of code and reasoning that helps the user learn. Claude is stuck in a terminal window, so reduced to monospaced bullet lines. Its verbose mode spits out lines of file listings and other context irrelevant to the user.
What you’re working on? In my industry it fails half of the time and comes up with absolute nonsense. The data just don’t exist for our problems, it can only work when you guide it and ask for a few functions at max.
This sounds like my experiences with it. I'm writing embedded firmware in C and Rust. I'd describe further, but Claude seems incompetent at all aspects of this space.
This. Every "AI is greate" response seems to be from someone doing web development - something I've intentionally avoided ever since I got tired of it around 2001, and hope to never have to do again.
We write C++ code in a very customized internal idiom to drive our hardware. Claude is great at filling in debugging statements / iterating over standard data structures to dump their contents, but not much else.
That seems to be a great example of precisely the sort of program an AI would be good at. A small focused product that only does one thing. Mainly gluing together other people's code. It's a polished greenfield project that does one tiny bit of focused functionality.
Interestingly, this guy has been making pretty much the same app as you, and live-streamed making it on youtube:
Looks like he's now pivoted to selling access to his discord server for vibe coding tips as I can't find a link to his product.
But if we're honest here, it's not going to take a ton of code to make that. All the functionality to do it is well documented.
Many people here could make a competitor in a week, without agentic AI, just using AI as a super-charged SO. The limiter pre-AI (aside from AI transcribing it) would have been reading and implementing/debugging all the documentation of the libraries you're using, which AI is great at circumventing.
Your product looks really good, and is an excellent example of what vibe coded AI is great at. I hope you're getting good traction.
Ah, I’ve tried that one, but I must be doing something wrong. I give it a fully specified working program, and often times it gives me back one that only works 50% of the time!
Does Claude Code provide some kind of "global memory" the llm refers to, or is this just a request you make within the the llm's context window? Just curious hadn't heard the use of the term
EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?
What does the playwright MCP accomplish for you? Is it basically a way for Claude to play with your app in the browser without having to write playwright tests?
Started as Pay-per-use service (upload the file and receive an email with transcript). Now we also offer subscription based service for media library, AI summaries, chat Q&A, insights/highlights.
Makes between $200 to $600 per month.
https://videotobe.com
Tips for sub growth in transcription space?
reply