I find this useful as a productivity tool. For example, this can give me my standup update summary. It knows what I worked on and can summarize it for me.
AI/LLMs are great at staying organized over huge amounts of data and this is the perfect application.
disclosure:
I am the founder of Perfect Memory AI https://www.perfectmemory.ai/ that does something very similar today.
Until the day it submits something to standup you don't want to and don't tell me you will always carefully filter it and then in best case you get fired. Worst case, you get criminally prosecuted.
One needs to follow the money to find the true direction. I think the ideal setup is that such a product is owned by a public figure/org who has no vested interest in making money or using it in a way.
It's encrypted (on top of Bitlocker) and local.
There's all this competition who makes the best, most articulate LLM. But the truth is that off-the-shelf 7B models can put sentences together with no problem. It's the context they're missing.
I feel like the storage requirements are really going to be these issue for these apps/services that run on "take screenshots and OCR them" functionality with LLMs. If you're using something like this a huge part of the value proposition is in the long term, but until something has a more efficient way to function, even a 1-year history is impractical for a lot of people.
For example, consider the classic situation of accidentally giving someone the same Christmas that you did a few years back. A sufficiently powerful personal LLM that 'remembers everything' could absolutely help with that (maybe even give you a nice table of the gifts you've purchased online, who they were for, and what categories of items would complement a previous gift), but only if it can practically store that memory for a multi-year time period.
It's not that bad. With Perfect Memory AI I see ~9GB a month. That's 108 GB/year. HDD/SSDs are getting bigger than that every year. The storage also varies by what you do, your workflow and display resolution. Here's an article I wrote on my finding of storage requirements. https://www.perfectmemory.ai/support/storage-resources/stora...
And if you want to use the data for LLM only, then you don't need to store the screenshots at all. Then it's ~ 15MB a month
The funny thing is Apple even have a support article on how to do this (and actually say in it "may improve your performance") I literally followed it step by step and it was very easy and had no issues.
Shipped to the UK for me added a bit to the overall price with shipping and import duty but it was still better value for money and hugely reliable brand than anything I could have bought domestically.
Except that Rewind uses chatGPT whereas this runs entirely locally. I would like to note though that Anonymous Analytics are enabled as well as auto-updates, both of which I disabled for privacy reasons. Encryption is also disabled by default. I just blocked everything with my firewall for peace of mind :)
Most screenshots are of the application window in the foreground, so unless your application spans all monitors, there is no significant overhead with multiple monitors. DPI on the other hand has a significant impact. The text is finer, taking more pixels...
I’m not sure if the above product does this, but you could use a multimodal model to extract descriptions of the screenshots and store those in a vector database with embeddings.
I set up two years ago a cron to screenshot every minute.
Just did the second phase of using ocrmac (vision kit cli on GitHub) that extracts text and dumps it in a SQLite with FTS5.
It’s simplistic but does the job for now.
I looked at reducing storage requirements by using image magik to only store the difference between images - some 5 min sequence are essentially the same screen - but let that one go.
/using image magik to only store the difference between images/
Well, that's basically how video codecs work... So might as well just find some codec params which work well with screen capture, and use an existing encoder.
I’m loose with my memory and I’d often recall reading or looking at something and could never find it in safari history etc. with info spread across WhatsApp emails files web history is helped nudge me in the right direction here and there. Saved me once when i made an online purchase, never got an email confirmation as well.
This is where Microsoft (and Apple) has a leg up -- they can hook the UI at the draw level and parse the interface far more reliably + efficently than screenshot + OCR.
This reminds me of how Sherlock, Spotlight and its iterations came to be. It was very resource intensive to index everything and keep a live db, until it was not.
Your website and blog are very low on details on how this is working. Downloading and installing an mai directly feels unsafe imo. Especially when I don't know how this software is working. Is it recording a video, performing OCR continuously, taking just screenshots
No mention of using any LLMs in there at all which is how you are presenting it in your comment here.
Feedback taken. I'll add more details on how this works for us technical people. LLM integration is in progress and coming soon.
Any idea what would make you feel safe? 3rd party verification? I had it verified and published by the Microsoft Store. I feel eventually it all comes down to me being a decent person.
welp. this pretty much convinces me that its time I get out of tech. lean into the tradework I do in my spare time.
because I'm sure you and people like you will succeed in your endeavors, naively thinking you're doing good. and you or someone like you will sell out, the most ruthless investor will take what you've built and use it as one more cludgel of power to beat the rest of us with.
If you want to help, use your knowledge to help shape policy. Because it is coming/already happening, and it will shape your life even if you are just living a simple life. I guarantee you that your city and state governments are passing legislation to incorporate AI to affect your life if they can be sold on it in the name of "good".
I live next to the Amish, trust me my township isn't passing anything related to AI.
For a reality check, name one instance of policy that has stopped the amoral march of tech being a tool of power to the hands of the few? Last one I can name is when they broke up Ma Bell. Now of course you can pick Verizon or AT&T, so that worked. /s
I installed it and kept it open for a full day but apparently it hasn't "saved" anything, and even if I open a Wiki page and a few minutes later search for that page, it returns nothing. Tried reading the Support FAQs on the website to no avail. Screen recording is on.
This seems very very interesting. I'm still learning python so probably can't build on this. But like a cheap mans' version of this would be to take a screenshot every couple of minutes, OCR it and send to it gpt for some kind of processing (or not, just keep it as a log). Right? Or am I missing something?
AI/LLMs are great at staying organized over huge amounts of data and this is the perfect application.
disclosure: I am the founder of Perfect Memory AI https://www.perfectmemory.ai/ that does something very similar today.