Hacker Newsnew | past | comments | ask | show | jobs | submit | DariusKocar's commentslogin

I find this useful as a productivity tool. For example, this can give me my standup update summary. It knows what I worked on and can summarize it for me.

AI/LLMs are great at staying organized over huge amounts of data and this is the perfect application.

disclosure: I am the founder of Perfect Memory AI https://www.perfectmemory.ai/ that does something very similar today.


Until the day it submits something to standup you don't want to and don't tell me you will always carefully filter it and then in best case you get fired. Worst case, you get criminally prosecuted.

Jesus, you people never learn.


One needs to follow the money to find the true direction. I think the ideal setup is that such a product is owned by a public figure/org who has no vested interest in making money or using it in a way.


I'm working on this! https://www.perfectmemory.ai/

It's encrypted (on top of Bitlocker) and local. There's all this competition who makes the best, most articulate LLM. But the truth is that off-the-shelf 7B models can put sentences together with no problem. It's the context they're missing.


I feel like the storage requirements are really going to be these issue for these apps/services that run on "take screenshots and OCR them" functionality with LLMs. If you're using something like this a huge part of the value proposition is in the long term, but until something has a more efficient way to function, even a 1-year history is impractical for a lot of people.

For example, consider the classic situation of accidentally giving someone the same Christmas that you did a few years back. A sufficiently powerful personal LLM that 'remembers everything' could absolutely help with that (maybe even give you a nice table of the gifts you've purchased online, who they were for, and what categories of items would complement a previous gift), but only if it can practically store that memory for a multi-year time period.


It's not that bad. With Perfect Memory AI I see ~9GB a month. That's 108 GB/year. HDD/SSDs are getting bigger than that every year. The storage also varies by what you do, your workflow and display resolution. Here's an article I wrote on my finding of storage requirements. https://www.perfectmemory.ai/support/storage-resources/stora...

And if you want to use the data for LLM only, then you don't need to store the screenshots at all. Then it's ~ 15MB a month


> That's 108 GB/year. HDD/SSDs are getting bigger than that every year.

Cries in MacBook Pro


Outboard TB 3/4 storage only seems expensive until you price it against Apple's native storage. Is it slower? Of course! Is it fast enough? Probably.


I recently moved my macOS installation to an external Thunderbolt drive - it's faster than the internal SSD.


Considering storage is a wasting asset and what Apple charges, this makes perfect sense to me.


The funny thing is Apple even have a support article on how to do this (and actually say in it "may improve your performance") I literally followed it step by step and it was very easy and had no issues.


Can you share the Thunderbolt drive you got?


https://glyphtech.com/products/atom-pro?variant=321211999191...

Shipped to the UK for me added a bit to the overall price with shipping and import duty but it was still better value for money and hugely reliable brand than anything I could have bought domestically.


It's Windows only so it won't run on your Mac anyway :-)


PerfectMemory is only available on Windows at the moment.


https://Rewind.ai is the macOS equivalent


Except that Rewind uses chatGPT whereas this runs entirely locally. I would like to note though that Anonymous Analytics are enabled as well as auto-updates, both of which I disabled for privacy reasons. Encryption is also disabled by default. I just blocked everything with my firewall for peace of mind :)


Does storage use scale linearly with the number of connected monitors (assuming each monitor uses the same resolution)?


Most screenshots are of the application window in the foreground, so unless your application spans all monitors, there is no significant overhead with multiple monitors. DPI on the other hand has a significant impact. The text is finer, taking more pixels...


Why should DPI matter if the app is taking screenshots?


Because screenshots are in pixels, not inches.


Is the 15mb basically embeddings from the video screenshots? What would it recall if there isn't the screenshots saved?


I’m not sure if the above product does this, but you could use a multimodal model to extract descriptions of the screenshots and store those in a vector database with embeddings.


I set up two years ago a cron to screenshot every minute.

Just did the second phase of using ocrmac (vision kit cli on GitHub) that extracts text and dumps it in a SQLite with FTS5.

It’s simplistic but does the job for now.

I looked at reducing storage requirements by using image magik to only store the difference between images - some 5 min sequence are essentially the same screen - but let that one go.


/using image magik to only store the difference between images/

Well, that's basically how video codecs work... So might as well just find some codec params which work well with screen capture, and use an existing encoder.


Thanks for sharing. Curious, what main value adds have you gotten out of this data?


I’m loose with my memory and I’d often recall reading or looking at something and could never find it in safari history etc. with info spread across WhatsApp emails files web history is helped nudge me in the right direction here and there. Saved me once when i made an online purchase, never got an email confirmation as well.


I think ultimately you’d want it to summarize that down to something like:

“Purchased socks from Amazon for $10 on 12/4/2024 at 5:04PM, shipped to Mom, 1600 Pennsylvania Av NW, Washington DC 20500, order number 1463355337

Probably stored in a vector DB for RAG.


Maybe. Until we find there’s a better way to encode the information and need the unfiltered, original context so it can be used with that new method.


This is where Microsoft (and Apple) has a leg up -- they can hook the UI at the draw level and parse the interface far more reliably + efficently than screenshot + OCR.


Google too, for all practical purposes, since presumably this is mostly just watching you use chrome 90% of the time.


All the more reason not to use Chrome...


This reminds me of how Sherlock, Spotlight and its iterations came to be. It was very resource intensive to index everything and keep a live db, until it was not.


Your website and blog are very low on details on how this is working. Downloading and installing an mai directly feels unsafe imo. Especially when I don't know how this software is working. Is it recording a video, performing OCR continuously, taking just screenshots

No mention of using any LLMs in there at all which is how you are presenting it in your comment here.


Feedback taken. I'll add more details on how this works for us technical people. LLM integration is in progress and coming soon.

Any idea what would make you feel safe? 3rd party verification? I had it verified and published by the Microsoft Store. I feel eventually it all comes down to me being a decent person.


I'd consider installing it if it had:

* In-depth technical explanation with architecture diagrams

* Open-source and self-hosted version

Also I didn't understand if it talks to a remote server or not. Because that's a big blocker for me.


welp. this pretty much convinces me that its time I get out of tech. lean into the tradework I do in my spare time.

because I'm sure you and people like you will succeed in your endeavors, naively thinking you're doing good. and you or someone like you will sell out, the most ruthless investor will take what you've built and use it as one more cludgel of power to beat the rest of us with.


If you want to help, use your knowledge to help shape policy. Because it is coming/already happening, and it will shape your life even if you are just living a simple life. I guarantee you that your city and state governments are passing legislation to incorporate AI to affect your life if they can be sold on it in the name of "good".


I live next to the Amish, trust me my township isn't passing anything related to AI.

For a reality check, name one instance of policy that has stopped the amoral march of tech being a tool of power to the hands of the few? Last one I can name is when they broke up Ma Bell. Now of course you can pick Verizon or AT&T, so that worked. /s

And that was 42 years ago.


Basically looks like rewind.ai but for the PC?


exactly. the UI is shockingly similar


I installed it and kept it open for a full day but apparently it hasn't "saved" anything, and even if I open a Wiki page and a few minutes later search for that page, it returns nothing. Tried reading the Support FAQs on the website to no avail. Screen recording is on.


This looks cool, I hope you support macOS at some point in the future


Any plan to implement this on macOS or Linux?


I got 90% of this built on Linux (around KDE Wayland) before other interests/priorities took over:

https://github.com/Zetaphor/screendiary/


This seems very very interesting. I'm still learning python so probably can't build on this. But like a cheap mans' version of this would be to take a screenshot every couple of minutes, OCR it and send to it gpt for some kind of processing (or not, just keep it as a log). Right? Or am I missing something?


Yes, that's exactly what's happening here, minus the sending it off to a third-party.

I didn't see the benefit when the OCR content is fully searchable, in addition to not wanting to pay OpenAI to spy on me.



macOS: https://screenmemory.app/

This is my application, it does not have AI running on top.


statistics about the usage would be cool


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: