De duplication is the killer feature. Especially if it can handle edited files e...

mceachen · on Oct 1, 2022

Deduplication is a hairy problem, and was my first priority to solve when trying to get my own mess of photos together when I started writing PhotoStructure.

I'm on the fifth major iteration of image hashing at this point, using a L*a*b mean hash, along with a kmeans-gathered set of dominant colors, along with dynamic thresholds that take into account differing mimetypes, fuzzy captured at times, and monochromatic images.

This explains a bunch of the issues and tradeoffs I made while assembling the heuristics in PhotoStructure : https://photostructure.com/faq/what-do-you-mean-by-deduplica...

novok · on Oct 2, 2022

Mylio has a pretty good dedup in my experience. It's extra careful also and lets you verify each one or just do it all at once: https://community.mylio.com/posts/video-introducing-deduplic...