Duplicate Image Detection

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

API

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
app		app
README.md		README.md
get-pip.py		get-pip.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Duplicate Image Detection

Difference Hash

API

References

About

Uh oh!

Releases

Packages

Languages

asalvi0/DuplicateImageDetection

Folders and files

Latest commit

History

Repository files navigation

Duplicate Image Detection

Difference Hash

API

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages