As I pointed in other comments, the current implementation does not focus on the scale problem. However, using the 'scores' attribute of the 'find_duplicates' function, one could obtain the hamming distance/cosine similarity and then use that to sort. For more, please refer the docs.