Deduplication is a hairy problem, and was my first priority to solve when trying to get my own mess of photos together when I started writing PhotoStructure.
I'm on the fifth major iteration of image hashing at this point, using a L*a*b mean hash, along with a kmeans-gathered set of dominant colors, along with dynamic thresholds that take into account differing mimetypes, fuzzy captured at times, and monochromatic images.
Exif tag management is a nightmare. Dublin core on steroids. Date and time handling for approximate time knowledge kills many systems.
It has to make choices about photo import implications for file path, and for file atime and mtime and multiple exif times, and private tags.
Google honours a ridiculous small set of tags, and never reread. Google does sidecar files to avoid file change breaking hash values.
All decisions have consequences. Photoprism and exiftool forums abound with special cases. A million of them.