Can you go into any detail on what technologies you used? Is there enough differentiating data in their attire to actually match agents? None of them are showing their faces so I wonder how many false positives would occur
I'm using a YOLO-WORLD-XL object detection model. Lets me detect objects using text. This is the initial filter that scans for agents - once those are detected and outlined with bounding boxes the entire image and each cropped bounding box are then sent to chatgpt to confirm if the image looks legit. Once image passes those checks - I create image embeddings of each agent using CLIP and those are stored in a vector DB, and each agent is then compared to the DB and matched.
The matching system isn't perfect - but I think good enough to get the point across and can be easily tuned with more data! Happy to take suggestions here - I just spun this up over the weekend
Can you go into any detail on what technologies you used? Is there enough differentiating data in their attire to actually match agents? None of them are showing their faces so I wonder how many false positives would occur