Python library for perceptual image hashing with multiple algorithms including average, perceptual, difference, wavelet, color, and crop-resistant hashing
npx @tessl/cli install tessl/pypi-imagehash@4.3.00
# ImageHash
1
2
A comprehensive Python library for perceptual image hashing that provides multiple hashing algorithms including average hashing, perceptual hashing, difference hashing, wavelet hashing, HSV color hashing, and crop-resistant hashing. Unlike cryptographic hashes, these perceptual hashes are designed to produce similar outputs for visually similar images, making them ideal for image deduplication, similarity detection, and reverse image search applications.
3
4
## Package Information
5
6
- **Package Name**: imagehash
7
- **Language**: Python
8
- **Installation**: `pip install imagehash`
9
- **Dependencies**: numpy, scipy, pillow, PyWavelets
10
11
## Core Imports
12
13
```python
14
import imagehash
15
```
16
17
Working with PIL/Pillow Image objects:
18
19
```python
20
from PIL import Image
21
import imagehash
22
```
23
24
## Basic Usage
25
26
```python
27
from PIL import Image
28
import imagehash
29
30
# Load images
31
image1 = Image.open('image1.jpg')
32
image2 = Image.open('image2.jpg')
33
34
# Generate hashes using different algorithms
35
ahash = imagehash.average_hash(image1)
36
phash = imagehash.phash(image1)
37
dhash = imagehash.dhash(image1)
38
39
# Compare images by calculating Hamming distance
40
distance = ahash - imagehash.average_hash(image2)
41
print(f"Hamming distance: {distance}")
42
43
# Check if images are similar (distance of 0 means identical hashes)
44
similar = distance < 10 # threshold depends on your needs
45
46
# Convert hash to string for storage
47
hash_string = str(ahash)
48
print(f"Hash: {hash_string}")
49
50
# Restore hash from string
51
restored_hash = imagehash.hex_to_hash(hash_string)
52
assert restored_hash == ahash
53
```
54
55
## Architecture
56
57
ImageHash provides two main classes for hash representation:
58
59
- **ImageHash**: Encapsulates single perceptual hashes with comparison operations
60
- **ImageMultiHash**: Container for multiple hashes used in crop-resistant hashing
61
62
The library supports multiple perceptual hashing algorithms, each with different strengths:
63
- **Average Hash**: Fast, good for detecting basic transformations
64
- **Perceptual Hash**: Uses DCT, robust to scaling and minor modifications
65
- **Difference Hash**: Tracks gradient changes, sensitive to rotation
66
- **Wavelet Hash**: Uses wavelets, configurable frequency analysis
67
- **Color Hash**: Analyzes color distribution rather than structure
68
- **Crop-Resistant Hash**: Segments image for crop tolerance
69
70
All hash functions accept PIL/Pillow Image objects and return ImageHash objects that support comparison operations and string serialization.
71
72
## Capabilities
73
74
### Hash Generation
75
76
Core perceptual hashing functions including average, perceptual, difference, wavelet, and color hashing algorithms. Each algorithm has different strengths for various image comparison scenarios.
77
78
```python { .api }
79
def average_hash(image, hash_size=8, mean=numpy.mean): ...
80
def phash(image, hash_size=8, highfreq_factor=4): ...
81
def phash_simple(image, hash_size=8, highfreq_factor=4): ...
82
def dhash(image, hash_size=8): ...
83
def dhash_vertical(image, hash_size=8): ...
84
def whash(image, hash_size=8, image_scale=None, mode='haar', remove_max_haar_ll=True): ...
85
def colorhash(image, binbits=3): ...
86
```
87
88
[Hash Generation](./hash-generation.md)
89
90
### Crop-Resistant Hashing
91
92
Advanced hashing technique that segments images into regions to provide resistance to cropping. Uses watershed-like algorithm to partition images into bright and dark segments, then hashes each segment individually.
93
94
```python { .api }
95
def crop_resistant_hash(
96
image,
97
hash_func=dhash,
98
limit_segments=None,
99
segment_threshold=128,
100
min_segment_size=500,
101
segmentation_image_size=300
102
): ...
103
```
104
105
[Crop-Resistant Hashing](./crop-resistant-hashing.md)
106
107
### Hash Conversion and Serialization
108
109
Functions for converting between hash objects and string representations, supporting both single hashes and multi-hashes. Includes compatibility functions for older hash formats.
110
111
```python { .api }
112
def hex_to_hash(hexstr): ...
113
def hex_to_flathash(hexstr, hashsize): ...
114
def hex_to_multihash(hexstr): ...
115
def old_hex_to_hash(hexstr, hash_size=8): ...
116
```
117
118
[Hash Conversion](./hash-conversion.md)
119
120
### Core Classes
121
122
Hash container classes that provide comparison operations, string conversion, and mathematical operations for computing similarity between images.
123
124
```python { .api }
125
class ImageHash:
126
def __init__(self, binary_array): ...
127
def __sub__(self, other): ... # Hamming distance
128
def __eq__(self, other): ... # Equality comparison
129
# ... other methods
130
131
class ImageMultiHash:
132
def __init__(self, hashes): ...
133
def matches(self, other_hash, region_cutoff=1, hamming_cutoff=None, bit_error_rate=None): ...
134
def best_match(self, other_hashes, hamming_cutoff=None, bit_error_rate=None): ...
135
# ... other methods
136
```
137
138
[Core Classes](./core-classes.md)
139
140
## Types
141
142
```python { .api }
143
# Type aliases for better type hints
144
NDArray = numpy.typing.NDArray[numpy.bool_] # Boolean numpy array
145
WhashMode = Literal['haar', 'db4'] # Wavelet modes
146
MeanFunc = Callable[[NDArray], float] # Mean function type
147
HashFunc = Callable[[Image.Image], ImageHash] # Hash function type
148
```
149
150
## Constants
151
152
```python { .api }
153
__version__ = '4.3.2' # Library version
154
ANTIALIAS = Image.Resampling.LANCZOS # PIL resampling method
155
```
156
157
## Utilities
158
159
### Command-Line Image Similarity Tool
160
161
The package includes a command-line utility script `find_similar_images.py` for finding similar images in directories.
162
163
```python { .api }
164
def find_similar_images(userpaths, hashfunc=imagehash.average_hash):
165
"""
166
Find similar images in specified directories using various hashing algorithms.
167
168
Args:
169
userpaths: List of directory paths to scan for images
170
hashfunc: Hash function to use (default: average_hash)
171
"""
172
```
173
174
**Command-line usage:**
175
176
```bash
177
# Find similar images using average hash
178
python find_similar_images.py ahash /path/to/images
179
180
# Available algorithms:
181
# ahash - Average hash
182
# phash - Perceptual hash
183
# dhash - Difference hash
184
# whash-haar - Haar wavelet hash
185
# whash-db4 - Daubechies wavelet hash
186
# colorhash - HSV color hash
187
# crop-resistant - Crop-resistant hash
188
```