Skip to content

Image Deduplicator

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

About

Image_Deduplicator_-_mona_lisa.png

This package provides functionality to make use of hashing algorithms that are particularly good at finding exact duplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation framework is also provided to judge the quality of deduplication for a given dataset.

Following details the functionality provided by the package:

  • Finding duplicates in a directory using one of the following algorithms:
    • Convolutional Neural Network (CNN)
    • Perceptual hashing (PHash)
    • Difference hashing (DHash)
    • Wavelet hashing (WHash)
    • Average hashing (AHash)
  • Generation of encodings for images using one of the above stated algorithms.
  • Framework to evaluate effectiveness of deduplication given a ground truth mapping.
  • Plotting duplicates found for a given image file.
  • Detailed documentation for the package can be found at: https://idealo.github.io/imagededup/

imagededup is compatible with Python 3.6 and is distributed under the Apache 2.0 license.

See also

Favorite site