CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-imagehash

Python library for perceptual image hashing with multiple algorithms including average, perceptual, difference, wavelet, color, and crop-resistant hashing

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

crop-resistant-hashing.mddocs/

Crop-Resistant Hashing

Advanced hashing technique that provides resistance to image cropping by segmenting images into regions and hashing each segment individually. Based on the algorithm described in "Efficient Cropping-Resistant Robust Image Hashing" (DOI 10.1109/ARES.2014.85).

Capabilities

Crop-Resistant Hash Generation

Creates a multi-hash by partitioning the image into bright and dark segments using a watershed-like algorithm, then applying a hash function to each segment's bounding box.

def crop_resistant_hash(
    image,
    hash_func=dhash,
    limit_segments=None,
    segment_threshold=128,
    min_segment_size=500,
    segmentation_image_size=300
):
    """
    Creates crop-resistant hash using image segmentation.
    
    This algorithm partitions the image into bright and dark segments using a 
    watershed-like algorithm, then hashes each segment. Provides resistance to 
    up to 50% cropping according to the research paper.
    
    Args:
        image (PIL.Image.Image): Input image to hash
        hash_func (callable): Hash function to apply to segments (default: dhash)
        limit_segments (int, optional): Limit to hashing only M largest segments
        segment_threshold (int): Brightness threshold between hills and valleys (default: 128)
        min_segment_size (int): Minimum pixels for a hashable segment (default: 500)
        segmentation_image_size (int): Size for segmentation processing (default: 300)
    
    Returns:
        ImageMultiHash: Multi-hash object containing segment hashes
    """

Usage Example:

from PIL import Image
import imagehash

# Load images
full_image = Image.open('full_photo.jpg')
cropped_image = Image.open('cropped_photo.jpg')  # 30% cropped version

# Generate crop-resistant hashes
full_hash = imagehash.crop_resistant_hash(full_image)
crop_hash = imagehash.crop_resistant_hash(cropped_image)

# Check if images match despite cropping
matches = full_hash.matches(crop_hash, region_cutoff=1)
print(f"Images match: {matches}")

# Get detailed comparison metrics
num_matches, sum_distance = full_hash.hash_diff(crop_hash)
print(f"Matching segments: {num_matches}, Total distance: {sum_distance}")

# Calculate overall similarity score
similarity_score = full_hash - crop_hash
print(f"Similarity score: {similarity_score}")

Advanced Configuration

Custom Hash Functions:

# Use different hash algorithms for segments
ahash_segments = imagehash.crop_resistant_hash(image, imagehash.average_hash)
phash_segments = imagehash.crop_resistant_hash(image, imagehash.phash)

# Custom hash function with parameters
def custom_hash(img):
    return imagehash.whash(img, mode='db4')

custom_segments = imagehash.crop_resistant_hash(image, custom_hash)

Segmentation Parameters:

# High sensitivity segmentation (more segments)
fine_hash = imagehash.crop_resistant_hash(
    image,
    segment_threshold=64,     # Lower threshold = more segments
    min_segment_size=200,    # Smaller minimum size
    segmentation_image_size=500  # Higher resolution processing
)

# Coarse segmentation (fewer, larger segments)
coarse_hash = imagehash.crop_resistant_hash(
    image,
    segment_threshold=200,   # Higher threshold = fewer segments
    min_segment_size=1000,  # Larger minimum size
    limit_segments=5        # Only hash 5 largest segments
)

Performance Optimization

Segment Limiting:

# Limit to top 3 segments for faster processing and storage
limited_hash = imagehash.crop_resistant_hash(
    image,
    limit_segments=3,
    min_segment_size=1000
)

Processing Size Control:

# Balance between accuracy and speed
fast_hash = imagehash.crop_resistant_hash(
    image,
    segmentation_image_size=200  # Faster processing
)

accurate_hash = imagehash.crop_resistant_hash(
    image,
    segmentation_image_size=600  # More accurate segmentation
)

Algorithm Details

The crop-resistant hashing process involves several steps:

  1. Image Preprocessing: Convert to grayscale and resize to segmentation size
  2. Filtering: Apply Gaussian blur and median filter to reduce noise
  3. Thresholding: Separate pixels into "hills" (bright) and "valleys" (dark)
  4. Region Growing: Use watershed-like algorithm to find connected regions
  5. Segment Filtering: Remove segments smaller than minimum size
  6. Bounding Box Creation: Create bounding boxes for each segment in original image
  7. Individual Hashing: Apply hash function to each segment's bounding box
  8. Multi-Hash Assembly: Combine individual hashes into ImageMultiHash object

Comparison and Matching

The ImageMultiHash class provides several methods for comparing crop-resistant hashes:

  • Exact Matching: hash1 == hash2 or hash1.matches(hash2)
  • Flexible Matching: Configure region cutoff and hamming distance thresholds
  • Distance Scoring: hash1 - hash2 returns similarity score
  • Best Match: Find closest match from a list of candidates

Internal Functions

The crop-resistant hashing algorithm uses several internal functions for image segmentation:

def _find_region(remaining_pixels, segmented_pixels):
    """
    Internal function to find connected regions in segmented image.
    
    Args:
        remaining_pixels (NDArray): Boolean array of unsegmented pixels
        segmented_pixels (set): Set of already segmented pixel coordinates
    
    Returns:
        set: Set of pixel coordinates forming a connected region
    """

def _find_all_segments(pixels, segment_threshold, min_segment_size):
    """
    Internal function to find all segments in an image.
    
    Args:
        pixels (NDArray): Grayscale pixel array
        segment_threshold (int): Brightness threshold for segmentation
        min_segment_size (int): Minimum pixels required for a segment
    
    Returns:
        list: List of segments, each segment is a set of pixel coordinates
    """

Limitations and Considerations

  • Processing Time: Significantly slower than basic hash functions due to segmentation
  • Memory Usage: Stores multiple hash objects (one per segment)
  • Parameter Sensitivity: Segmentation parameters affect matching performance
  • Version Compatibility: Results may vary slightly between Pillow versions due to grayscale conversion changes

Use Cases

Crop-resistant hashing is ideal for:

  • Reverse Image Search: Finding images even when cropped or partially visible
  • Social Media Monitoring: Detecting shared images with various crops/frames
  • Copyright Detection: Identifying copyrighted content despite cropping
  • Duplicate Detection: Finding similar images with different aspect ratios
  • Content Moderation: Detecting prohibited content regardless of cropping

docs

core-classes.md

crop-resistant-hashing.md

hash-conversion.md

hash-generation.md

index.md

tile.json