As clever as this is, it seems like the names are fairly straightforward (as you'd want!) – did you try using the on-device Apple Foundation model at all? That's actually pretty powerful for a use case like this, and if you're happy to require the user has Apple Intelligence turned on already, your shipped app can end up being tiny. The biggest concern for an app like this is how much RAM you end up using trying to run it. Especially if we end up with lots of different apps all doing the same thing.
Being able to super-power apps with on-device models is a lot of fun. I recently did the same building my own dictation app using small local models, and I still can't believe how effective it is. The download is just 20mb, though it will download parakeet ~475mb for audio, but can use the on-device model as the second-pass LLM and works pretty well (though better models are available to download and use e.g. Llama 3.2 4bit and Qwen 2.5 7B 4bit)
I'm currently building a little tool for a professional photographer friend to go through and classify images in their photoshoots, so I can build a searchable db for them to quickly find very specific images in the future. I simply don't think it would have been possible for me to build a tool like that just a couple years ago at any price.
Thanks for the feedback. I did not know my Mac had an on-device Apple Foundation model. Is it multimodal? I'll be checking it out and comparing it with Google Gemma 4. I thought Apple was out of the AI model race.
The idea is to ship more powerful lightweight free models as they become available. I'm looking forward to Gemma 5!
> The biggest concern for an app like this is how much RAM you end up using trying to run it
You are totally right. A new feature for a future version would be to turn off the model when the app is idle. And only launch it next time the user takes a screenshot. It is a trade-off between latency to generate the names and memory RAM.
It's not as powerful as Gemma 4, but I think they likened it to GPT-3. It's perfectly capable of looking at images and classifying them at the level you'll need for this app. And it runs everything on the Apple Neural engines, so decently quick. Of course, this assumes that your users are using Apple Silicon processors, I believe that's the limitation – and they must have enabled Apple Intelligence which downloads the model at that point.
I had a go at building both a Mac and iOS dictation app the other day (dictator.robgough.net) thinking that with Claude's input, this probably wouldn't work... but it was a real problem I had, and I wanted to see how far we could get. Best way to learn the tools, right? I'd already spent the day playing with alternative apps that didn't quite do what I wanted.
The app itself is fairly straightforward, but it included some intermediate complexity in terms of audio capture and calling local models. Both something I'd never done, and as not-a-mac dev something I probably wouldn't have attempted for a side-project while I'm meant to be bootstrapping my own thing.
I didn't touch a line of code, and I was blown away. I'm so impressed in fact that I'm predicting we'll see a resurgence in native apps in the near future. By far the worst (and slowest) part of the process is having to deal with the App Store, and the ridiculous hoops you have to jump through to get past review.
That looks great! It's so clearly going to be the dominant way of working with your computer. I was such a cynic, so I was completely shocked at how quickly it became my favourite way to interact with my devices.
I'm fully expecting to be completely sherlocked during Apple's WWDC event in the coming weeks. In fact, I'm hoping we will be. However, I fear that they might not be quite ready for that yet. This functionality really should sit at the top OS level, equal to Keyboard/Mouse/Trackpad etc.
You need to have an Apple developer account. Then you need to submit your app to Apple for review. Then you need to comply with a list of sometimes arbitrary corrections/requirements that they send back (there is a document that specifies what you need to do, but it is not uniformly enforced in my experience). Then, eventually, you can list your app on the app store.
It’s not super onerous, but it is much more annoying than the theoretical alternative of allowing people to install software of their choosing on their hardware (i.e. download the binary and run it)
For example the iOS app failed first time as I accidentally used "Free" in the app name, and the app declared support for UIBackgroundModes but they were "unable to locate any features that require persistent audio". The dictation keyboard switched you to the app, then if you left would keep recording... fairly basic and obvious stuff. I could have either gone back and argued the case, or simply rip it out which I opted for.
It's now failed again because: The keyboard extension does not provide any functionality when the "Full Access" setting is toggled off.
Well no, hardly my fault you've locked down the usefulness of third-party keyboards, but now I've added a full keyboard in there so it's a bit more useful without that access. I don't expect any users to ever see this. Admittedly this was more frustrating when that would be a couple days work, not just a quick prompt, to fix.
Good luck with your app. Don't worry too much, you can generally work through their issues... but it can be a slow process. Make sure you leave plenty of time between your submission and when you want to launch!
Do you want to release it to the general public or just a circle of friends and family? TestFlight lets you have up to a hundred users forever without ever truly releasing. I have a couple of home-relevant apps I am managing that way.
I think that's a fair question. For this app I didn't see the harm in releasing it. You have to pay for the account etc. to get it onto TestFlight anyway, so I might as well just put it out there. And avoids me having to resubmit every 90 days. There also wasn't a lot of custom "me" stuff in this one, but I could see going that route for other apps in the future.
I mean, in general I dislike some of the more extreme app store gating.. but if apps are getting vibe coded with little effort I think gatekeeping is more important than ever. I think "is the author willing to put in the work to pass review" might be a useful heuristic, and it could also prevent things like vulnerable software being published. Plus it amuses me to imagine big tech having to deal with the slop apocalypse they've created!
Review times have already spiked, apparently, although I've never found them particularly good. There used to be a third-party website that tracked this via user submissions, though that was shut down when they made genuine efforts to improve timings several years ago as it was deemed no longer neccesary.
I think it's a fair criticism that there will be a lot of people putting "lower-quality" apps on there. I do think Claude did a better job of this than I would have managed. It certainly seems to work well enough.
I do believe Apple will have to rethink exactly what they want to gatekeep and why. If nothing else, in the context of allowing people to get more out of their powerful devices. For example, I wish my iPad Pro could actually do more useful things with it's M4 processor. Hopefully that's something the new CEO has on his list.
I do all my dev inside docker/orbstack environments. I've been using a Tailscale sidecar for each, which has let me easily spin up second (and third!) copy of each repo without having to worry about them interfering with each other (the same open ports etc.). I've not extended to using worktrees, as right now I prefer entirely separate clone's of a repo, but that may well change and I suspect this would work well for that too.
Also has the handy effect of making it super easy to share my dev environment with anyone else on my tailnet, though this could be locked down if needed.
I do yeah, Tailscale is generous with the "device" counts so I'm not worried about using them up -- especially as I spin them up as ephemeral, so as soon as I shutdown the stack they're gone, but the "random" name persists across shutdowns as I store it in a file that stays out of git.
The subdomain routing then works by pointing to that ephemeral machines ip, and my site in dev mode populates the sidebar with active links for this so it's not like I have to keep updating bookmarks etc. Super convenient. It's probably the weakest part of the setup (no https) but works fine for my needs.
The Reddit app is so very bad. So bad in fact that I nearly considered paying for Narwhal, but then realised I should probably be trying to use Reddit less so paying for a subscription to access it would be madness.
I do miss Apollo, which was lost when they started charging crazy money for their API. I do wonder how much that has affected their usage overall. But I don't think I've ever seen any long term reviews of the impact of those changes?
Based on how much rubbish is in the App Store, it is shocking to me – but getting that first approval can be really brutal. Took me well over a month last October/November as it kept getting rejected for things that I considered very much outside the purview of App Store review†, and this was for a simple and straight forward client portal for our company. If I didn't know better, I'd think they don't want you building free apps.
†my favourite was that on one of our menu screens the bottom button was slightly obscured by the tab bar at the bottom, so of course that was a failure. I'm not sure when they started this sort of semi-QA type of service, but I suspect no one asked for it.
ok update: mines been pretty decent besides the initial setup. now its getting reviewed and approved almost ever 24 hrs. i feel like it wont be that hard to ship fast on ios either. i just have to do it once a day which sucks but it is what is.
It seems like managed pg improvements aren’t high up on the priority list (there’s a mention of outsourcing it to a third-party like supabase[1]), so I’d just like to +1 this request — for personal projects the snapshots are fine, but I’m starting a new role where I wanted to use fly but without easier backup/restore it’s likely a non-starter.
A small request: it would also be useful if `fly volumes snapshots list vol_123…` included the time they were taken, not just “n days ago”. If I’m having to rollback, it would be good to tell my team & users exactly when I’m rolling back to!
I'm really interested in fly.io after their Postgres post the other day, but I've not seen anywhere what their recommended solution is for ActiveStorage – is it still writing to S3 (or equivalent), or would it be somehow using their volumes? Are there any published examples (blog posts etc)?
Using volumes as some sort of s3 would require you to basically build s3 from scratch, you would still need to create an auth system, a server to handle uploads, a way to manage uploaded content maintaining their content type somehow (maybe using another db), a static server, .etc
Check out Filebase [0]. It's powered by Web3 technologies including decentralized and distributed storage networks, effectively creating one giant "global" S3 region. You could almost think of it as the storage-equivalent to Fly.io. It also works with ActiveStorage out of the box since Filebase has an S3 compatible API.
Wejo (https://wejo.com) | Various Roles | Full-time | UK-based Remote and/or Manchester, UK
We’re shaping the future of mobility. We innovate with transformative volumes of connected car data to create cutting-edge, actionable insights that will transform the way we live, work and travel. We’re looking for creative, inquisitive and driven individuals to join our team.
We do have offices in Manchester and Chester, for occasional in-person meetings (in-person attendance optional).
Being able to super-power apps with on-device models is a lot of fun. I recently did the same building my own dictation app using small local models, and I still can't believe how effective it is. The download is just 20mb, though it will download parakeet ~475mb for audio, but can use the on-device model as the second-pass LLM and works pretty well (though better models are available to download and use e.g. Llama 3.2 4bit and Qwen 2.5 7B 4bit)
I'm currently building a little tool for a professional photographer friend to go through and classify images in their photoshoots, so I can build a searchable db for them to quickly find very specific images in the future. I simply don't think it would have been possible for me to build a tool like that just a couple years ago at any price.
reply