This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.
dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:
- Convert to greyscale*
- Resize image to (hash_size+1, hash_size)
- Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
- Assign bits based on horizontal gradient values
*We convert the image to greyscale before resizing for optimal performance
[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html