Python library for perceptual image hashing with multiple algorithms including average, perceptual, difference, wavelet, color, and crop-resistant hashing
npx @tessl/cli install tessl/pypi-imagehash@4.3.0A comprehensive Python library for perceptual image hashing that provides multiple hashing algorithms including average hashing, perceptual hashing, difference hashing, wavelet hashing, HSV color hashing, and crop-resistant hashing. Unlike cryptographic hashes, these perceptual hashes are designed to produce similar outputs for visually similar images, making them ideal for image deduplication, similarity detection, and reverse image search applications.
pip install imagehashimport imagehashWorking with PIL/Pillow Image objects:
from PIL import Image
import imagehashfrom PIL import Image
import imagehash
# Load images
image1 = Image.open('image1.jpg')
image2 = Image.open('image2.jpg')
# Generate hashes using different algorithms
ahash = imagehash.average_hash(image1)
phash = imagehash.phash(image1)
dhash = imagehash.dhash(image1)
# Compare images by calculating Hamming distance
distance = ahash - imagehash.average_hash(image2)
print(f"Hamming distance: {distance}")
# Check if images are similar (distance of 0 means identical hashes)
similar = distance < 10 # threshold depends on your needs
# Convert hash to string for storage
hash_string = str(ahash)
print(f"Hash: {hash_string}")
# Restore hash from string
restored_hash = imagehash.hex_to_hash(hash_string)
assert restored_hash == ahashImageHash provides two main classes for hash representation:
The library supports multiple perceptual hashing algorithms, each with different strengths:
All hash functions accept PIL/Pillow Image objects and return ImageHash objects that support comparison operations and string serialization.
Core perceptual hashing functions including average, perceptual, difference, wavelet, and color hashing algorithms. Each algorithm has different strengths for various image comparison scenarios.
def average_hash(image, hash_size=8, mean=numpy.mean): ...
def phash(image, hash_size=8, highfreq_factor=4): ...
def phash_simple(image, hash_size=8, highfreq_factor=4): ...
def dhash(image, hash_size=8): ...
def dhash_vertical(image, hash_size=8): ...
def whash(image, hash_size=8, image_scale=None, mode='haar', remove_max_haar_ll=True): ...
def colorhash(image, binbits=3): ...Advanced hashing technique that segments images into regions to provide resistance to cropping. Uses watershed-like algorithm to partition images into bright and dark segments, then hashes each segment individually.
def crop_resistant_hash(
image,
hash_func=dhash,
limit_segments=None,
segment_threshold=128,
min_segment_size=500,
segmentation_image_size=300
): ...Functions for converting between hash objects and string representations, supporting both single hashes and multi-hashes. Includes compatibility functions for older hash formats.
def hex_to_hash(hexstr): ...
def hex_to_flathash(hexstr, hashsize): ...
def hex_to_multihash(hexstr): ...
def old_hex_to_hash(hexstr, hash_size=8): ...Hash container classes that provide comparison operations, string conversion, and mathematical operations for computing similarity between images.
class ImageHash:
def __init__(self, binary_array): ...
def __sub__(self, other): ... # Hamming distance
def __eq__(self, other): ... # Equality comparison
# ... other methods
class ImageMultiHash:
def __init__(self, hashes): ...
def matches(self, other_hash, region_cutoff=1, hamming_cutoff=None, bit_error_rate=None): ...
def best_match(self, other_hashes, hamming_cutoff=None, bit_error_rate=None): ...
# ... other methods# Type aliases for better type hints
NDArray = numpy.typing.NDArray[numpy.bool_] # Boolean numpy array
WhashMode = Literal['haar', 'db4'] # Wavelet modes
MeanFunc = Callable[[NDArray], float] # Mean function type
HashFunc = Callable[[Image.Image], ImageHash] # Hash function type__version__ = '4.3.2' # Library version
ANTIALIAS = Image.Resampling.LANCZOS # PIL resampling methodThe package includes a command-line utility script find_similar_images.py for finding similar images in directories.
def find_similar_images(userpaths, hashfunc=imagehash.average_hash):
"""
Find similar images in specified directories using various hashing algorithms.
Args:
userpaths: List of directory paths to scan for images
hashfunc: Hash function to use (default: average_hash)
"""Command-line usage:
# Find similar images using average hash
python find_similar_images.py ahash /path/to/images
# Available algorithms:
# ahash - Average hash
# phash - Perceptual hash
# dhash - Difference hash
# whash-haar - Haar wavelet hash
# whash-db4 - Daubechies wavelet hash
# colorhash - HSV color hash
# crop-resistant - Crop-resistant hash