As far as I can tell so far, its functionality isn't much beyond a a short script that Claude could generate for me in 30 seconds or I could write myself in 20 minutes.
I use Claude to write short, isolated scripts, like something to sort photos based on EXIF data, but I never just trust anything it does. I read and debug every line.
I'd never let a junior dev use one of these models, but I've been coding for 30 years and know how to catch mistakes. It saves a huge amount of time.
I've done this multiple times and Claude explicitly provided a config parameter in the script to make it "read only" by default.
Regardless, why wouldn't anyone test it on a small subset of your photos before trying it on a full collection? You would do it with a script you wrote personally and you should do it with an LLM script as well.