CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-imagehash

Python library for perceptual image hashing with multiple algorithms including average, perceptual, difference, wavelet, color, and crop-resistant hashing

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

hash-generation.mddocs/

Hash Generation

Core perceptual hashing functions that analyze image structure and content to produce compact fingerprints. Each algorithm has different strengths and is optimized for specific types of image comparisons and transformations.

Capabilities

Average Hash

Computes hash based on average pixel luminance. Fast and effective for detecting basic transformations like scaling and format changes.

def average_hash(image, hash_size=8, mean=numpy.mean):
    """
    Average Hash computation following hackerfactor algorithm.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size, must be >= 2 (default: 8)
        mean (callable): Function to compute average luminescence (default: numpy.mean)
    
    Returns:
        ImageHash: Hash object representing the image
    
    Raises:
        ValueError: If hash_size < 2
    """

Usage Example:

from PIL import Image
import imagehash
import numpy as np

image = Image.open('photo.jpg')

# Standard average hash
hash1 = imagehash.average_hash(image)

# Custom hash size for more precision
hash2 = imagehash.average_hash(image, hash_size=16)

# Using median instead of mean
hash3 = imagehash.average_hash(image, mean=np.median)

Perceptual Hash (pHash)

Uses Discrete Cosine Transform (DCT) to analyze frequency domain. Robust to scaling, minor modifications, and gamma adjustments.

def phash(image, hash_size=8, highfreq_factor=4):
    """
    Perceptual Hash computation using DCT.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size, must be >= 2 (default: 8)
        highfreq_factor (int): High frequency scaling factor (default: 4)
    
    Returns:
        ImageHash: Hash object representing the image
    
    Raises:
        ValueError: If hash_size < 2
    """

Simplified Perceptual Hash

Simplified version of perceptual hash with different DCT processing.

def phash_simple(image, hash_size=8, highfreq_factor=4):
    """
    Simplified Perceptual Hash computation.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size (default: 8)
        highfreq_factor (int): High frequency scaling factor (default: 4)
    
    Returns:
        ImageHash: Hash object representing the image
    """

Difference Hash (dHash)

Computes differences between adjacent pixels. Sensitive to rotation but good for detecting structural changes.

def dhash(image, hash_size=8):
    """
    Difference Hash computation using horizontal pixel differences.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size, must be >= 2 (default: 8)
    
    Returns:
        ImageHash: Hash object representing the image
    
    Raises:
        ValueError: If hash_size < 2
    """

Vertical Difference Hash

Variant of difference hash that computes vertical pixel differences instead of horizontal.

def dhash_vertical(image, hash_size=8):
    """
    Difference Hash computation using vertical pixel differences.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size (default: 8)
    
    Returns:
        ImageHash: Hash object representing the image
    """

Wavelet Hash (wHash)

Uses wavelet transforms for frequency analysis. Configurable wavelet modes and scale parameters.

def whash(image, hash_size=8, image_scale=None, mode='haar', remove_max_haar_ll=True):
    """
    Wavelet Hash computation using PyWavelets.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_size (int): Hash size, must be power of 2 (default: 8)
        image_scale (int, optional): Image scale, must be power of 2. Auto-calculated if None
        mode (str): Wavelet mode - 'haar' or 'db4' (default: 'haar')
        remove_max_haar_ll (bool): Remove lowest frequency using Haar wavelet (default: True)
    
    Returns:
        ImageHash: Hash object representing the image
    
    Raises:
        AssertionError: If hash_size or image_scale are not powers of 2
        AssertionError: If hash_size is in wrong range relative to image_scale
    """

Usage Example:

# Standard wavelet hash with Haar wavelets
hash1 = imagehash.whash(image)

# Using Daubechies wavelets
hash2 = imagehash.whash(image, mode='db4')

# Custom hash size and image scale
hash3 = imagehash.whash(image, hash_size=16, image_scale=128)

# Disable low frequency removal
hash4 = imagehash.whash(image, remove_max_haar_ll=False)

Color Hash

Analyzes color distribution in HSV space rather than structural features. Effective for detecting color-based similarity.

def colorhash(image, binbits=3):
    """
    Color Hash computation based on HSV color distribution.
    
    Computes fractions of image in intensity, hue and saturation bins:
    - First binbits encode black fraction of image
    - Next binbits encode gray fraction (low saturation)
    - Next 6*binbits encode highly saturated parts in 6 hue bins
    - Next 6*binbits encode mildly saturated parts in 6 hue bins
    
    Args:
        image (PIL.Image.Image): Input image to hash
        binbits (int): Number of bits for encoding pixel fractions (default: 3)
    
    Returns:
        ImageHash: Hash object representing the color distribution
    """

Usage Example:

# Standard color hash
color_hash = imagehash.colorhash(image)

# Higher precision color analysis
detailed_hash = imagehash.colorhash(image, binbits=4)

# Compare color similarity regardless of structure
image1_color = imagehash.colorhash(image1)
image2_color = imagehash.colorhash(image2)
color_distance = image1_color - image2_color

Algorithm Selection Guide

  • Average Hash: Best for basic duplicate detection and format conversions
  • Perceptual Hash: Ideal for detecting scaled or slightly modified images
  • Difference Hash: Good for structural changes, sensitive to rotation
  • Wavelet Hash: Configurable frequency analysis, good for detailed comparisons
  • Color Hash: Focus on color similarity rather than structure
  • Crop-Resistant Hash: When images may be cropped or partially occluded

Performance Considerations

Hash computation performance (fastest to slowest):

  1. Average Hash - Simple pixel averaging
  2. Difference Hash - Pixel difference computation
  3. Color Hash - HSV conversion and binning
  4. Perceptual Hash - DCT transformation
  5. Wavelet Hash - Wavelet decomposition
  6. Crop-Resistant Hash - Image segmentation + multiple hashes

Choose the algorithm based on your specific use case requirements for speed vs. robustness.

docs

core-classes.md

crop-resistant-hashing.md

hash-conversion.md

hash-generation.md

index.md

tile.json