0
# Deep Hashing
1
2
Content-based hashing system that generates consistent hash values for Python objects regardless of key ordering or minor structural differences. DeepHash provides stable, reproducible hashes that can be used for caching, deduplication, and change detection across complex data structures.
3
4
## Capabilities
5
6
### Object Hashing
7
8
Generate deep hash values for any Python object with extensive customization options for handling different data types and structures.
9
10
```python { .api }
11
class DeepHash:
12
def __init__(
13
self,
14
obj: Any,
15
*,
16
apply_hash: bool = True,
17
custom_operators: Optional[List[Any]] = None,
18
default_timezone: Union[datetime.timezone, "BaseTzInfo"] = datetime.timezone.utc,
19
encodings: Optional[List[str]] = None,
20
exclude_obj_callback: Optional[Callable[[Any, str], bool]] = None,
21
exclude_paths: Optional[PathType] = None,
22
exclude_regex_paths: Optional[RegexType] = None,
23
exclude_types: Optional[Union[List[type], Set[type], Tuple[type, ...]]] = None,
24
hasher: Optional[Callable[[Union[str, bytes]], str]] = None,
25
hashes: Optional[Union[Dict[Any, Any], "DeepHash"]] = None,
26
ignore_encoding_errors: bool = False,
27
ignore_iterable_order: bool = True,
28
ignore_numeric_type_changes: bool = False,
29
ignore_private_variables: bool = True,
30
ignore_repetition: bool = True,
31
ignore_string_case: bool = False,
32
ignore_string_type_changes: bool = False,
33
ignore_type_in_groups: Any = None,
34
ignore_type_subclasses: bool = False,
35
ignore_uuid_types: bool = False,
36
include_paths: Optional[PathType] = None,
37
number_format_notation: str = "f",
38
number_to_string_func: Optional[NumberToStringFunc] = None,
39
parent: str = "root",
40
significant_digits: Optional[int] = None,
41
truncate_datetime: Optional[str] = None,
42
use_enum_value: bool = False,
43
**kwargs
44
):
45
"""
46
Deep hash of objects based on their content.
47
48
Parameters:
49
- obj: Object to hash
50
- hasher: Hash function to use (default: SHA256)
51
- ignore_type_in_groups: List of type groups to treat as equivalent for hashing
52
- ignore_encoding_errors: Ignore encoding errors when hashing strings
53
- ignore_numeric_type_changes: Ignore type differences between numeric types
54
- ignore_type_subclasses: Ignore type differences between subclasses
55
- ignore_string_type_changes: Ignore differences between string types
56
- exclude_types: List of types to exclude from hashing
57
- exclude_paths: List of paths to exclude from hashing
58
- exclude_regex_paths: List of regex patterns for paths to exclude
59
- significant_digits: Number of significant digits for float hashing
60
- apply_hash: Whether to apply the hash function to the result
61
- encodings: List of encodings to try for string hashing
62
- ignore_private_variables: Ignore private attributes (starting with _)
63
- parent: Parent DeepHash object for nested hashing
64
- ignore_repetition: Ignore repetitive elements when hashing
65
- number_format_notation: Notation for number formatting in hashes
66
- exclude_obj_callback: Callback function to exclude objects from hashing
67
- exclude_obj_callback_strict: Strict mode for exclude callback
68
- number_to_string_func: Custom function for number to string conversion
69
- ignore_nan_equality: Ignore NaN values when hashing
70
- math_epsilon: Epsilon for floating point hash comparisons
71
"""
72
```
73
74
### Hash Retrieval
75
76
Methods for retrieving hash values from hashed objects with optional path-based access.
77
78
```python { .api }
79
def get(self, path: str = None) -> str:
80
"""
81
Get hash value for the object or a specific path within it.
82
83
Parameters:
84
- path: Optional path to get hash for specific part of object
85
86
Returns:
87
Hash string for the specified object or path.
88
"""
89
90
def hexdigest(self) -> str:
91
"""
92
Get hexadecimal representation of the object's hash.
93
94
Returns:
95
Hexadecimal hash string.
96
"""
97
```
98
99
### Dictionary-like Access
100
101
DeepHash objects support dictionary-like access for retrieving hash values.
102
103
```python { .api }
104
def __getitem__(self, obj: Any, extract_index: Optional[int] = 0) -> Any:
105
"""
106
Get hash value for an object using dictionary-like access.
107
108
Parameters:
109
- obj: Object to get hash for
110
- extract_index: Index for extracting from the hash result
111
112
Returns:
113
Hash value for the object
114
"""
115
116
def __contains__(self, obj: Any) -> bool:
117
"""
118
Check if an object has been hashed.
119
120
Parameters:
121
- obj: Object to check
122
123
Returns:
124
True if object has been hashed, False otherwise
125
"""
126
127
def keys(self) -> Any:
128
"""
129
Get all keys (objects) that have been hashed.
130
131
Returns:
132
Keys from the hash result
133
"""
134
135
def values(self) -> Generator[Any, None, None]:
136
"""
137
Get all hash values.
138
139
Returns:
140
Generator yielding hash values
141
"""
142
143
def items(self) -> Generator[Tuple[Any, Any], None, None]:
144
"""
145
Get all (object, hash) pairs.
146
147
Returns:
148
Generator yielding (object, hash) tuples
149
"""
150
151
def __eq__(self, other: Any) -> bool:
152
"""
153
Compare two DeepHash objects for equality.
154
155
Parameters:
156
- other: Other DeepHash object to compare
157
158
Returns:
159
True if hashes are equal, False otherwise
160
"""
161
162
def __bool__(self) -> bool:
163
"""
164
Check if the DeepHash object contains any hashes.
165
166
Returns:
167
True if hashes exist, False otherwise
168
"""
169
```
170
171
### Static Hash Functions
172
173
Built-in static hash functions available for use with DeepHash.
174
175
```python { .api }
176
@staticmethod
177
def sha256hex(obj: Union[str, bytes]) -> str:
178
"""
179
Generate SHA256 hash in hexadecimal format.
180
181
Parameters:
182
- obj: Data to hash
183
184
Returns:
185
SHA256 hash as hexadecimal string.
186
"""
187
188
@staticmethod
189
def sha1hex(obj: Union[str, bytes]) -> str:
190
"""
191
Generate SHA1 hash in hexadecimal format.
192
193
Parameters:
194
- obj: Data to hash
195
196
Returns:
197
SHA1 hash as hexadecimal string.
198
"""
199
200
@staticmethod
201
def _getitem(hashes: Dict[Any, Any], obj: Any, extract_index: Optional[int] = 0, use_enum_value: bool = False) -> Any:
202
"""
203
Static method to get item from hashes dictionary.
204
205
Parameters:
206
- hashes: Dictionary of hashes
207
- obj: Object to get hash for
208
- extract_index: Index for extraction
209
- use_enum_value: Whether to use enum values
210
211
Returns:
212
Hash value for the object
213
"""
214
215
@staticmethod
216
def get_key(hashes: Dict[Any, Any], key: Any, default: Any = None, extract_index: Optional[int] = 0, use_enum_value: bool = False) -> Any:
217
"""
218
Static method to get key from hashes dictionary with default.
219
220
Parameters:
221
- hashes: Dictionary of hashes
222
- key: Key to get hash for
223
- default: Default value if key not found
224
- extract_index: Index for extraction
225
- use_enum_value: Whether to use enum values
226
227
Returns:
228
Hash value for the key or default value
229
"""
230
```
231
232
## Usage Examples
233
234
### Basic Hashing
235
236
```python
237
from deepdiff import DeepHash
238
239
# Hash simple objects
240
obj1 = {"name": "John", "age": 30}
241
obj2 = {"age": 30, "name": "John"} # Same content, different order
242
243
hash1 = DeepHash(obj1)
244
hash2 = DeepHash(obj2)
245
246
print(hash1 == hash2) # True - same content produces same hash
247
print(hash1.hexdigest()) # Get hex representation
248
```
249
250
### Ignoring Type Differences
251
252
```python
253
# Hash with type normalization
254
data1 = {"value": 42}
255
data2 = {"value": 42.0}
256
257
# Without type normalization - different hashes
258
hash1 = DeepHash(data1)
259
hash2 = DeepHash(data2)
260
print(hash1 == hash2) # False
261
262
# With type normalization - same hashes
263
hash1 = DeepHash(data1, ignore_numeric_type_changes=True)
264
hash2 = DeepHash(data2, ignore_numeric_type_changes=True)
265
print(hash1 == hash2) # True
266
```
267
268
### Custom Hash Functions
269
270
```python
271
import hashlib
272
273
# Use custom hash function
274
def custom_hasher(data):
275
return hashlib.md5(str(data).encode()).hexdigest()
276
277
hash_obj = DeepHash(data, hasher=custom_hasher)
278
print(hash_obj.hexdigest())
279
```
280
281
### Excluding Paths
282
283
```python
284
# Hash while excluding certain paths
285
data = {
286
"user_id": 123,
287
"name": "John",
288
"timestamp": "2023-01-01T00:00:00Z",
289
"metadata": {"created_by": "system"}
290
}
291
292
# Exclude timestamp for stable hashing
293
hash_obj = DeepHash(
294
data,
295
exclude_paths=["root['timestamp']", "root['metadata']['created_by']"]
296
)
297
```
298
299
### Precision Control
300
301
```python
302
# Control floating point precision
303
data = {"pi": 3.141592653589793}
304
305
# Hash with different precision levels
306
hash1 = DeepHash(data, significant_digits=2) # 3.14
307
hash2 = DeepHash(data, significant_digits=4) # 3.142
308
309
print(hash1.hexdigest())
310
print(hash2.hexdigest())
311
```
312
313
### Path-Based Hash Retrieval
314
315
```python
316
nested_data = {
317
"level1": {
318
"level2": {
319
"data": [1, 2, 3],
320
"other": "value"
321
}
322
}
323
}
324
325
hash_obj = DeepHash(nested_data)
326
327
# Get hash for entire object
328
full_hash = hash_obj.get()
329
330
# Get hash for specific path
331
partial_hash = hash_obj.get("root['level1']['level2']['data']")
332
333
print(f"Full object hash: {full_hash}")
334
print(f"Array hash: {partial_hash}")
335
```
336
337
### Working with Custom Objects
338
339
```python
340
class Person:
341
def __init__(self, name, age):
342
self.name = name
343
self.age = age
344
self._private = "secret"
345
346
person1 = Person("John", 30)
347
person2 = Person("John", 30)
348
349
# Hash custom objects
350
hash1 = DeepHash(person1)
351
hash2 = DeepHash(person2)
352
353
print(hash1 == hash2) # True - same attribute values
354
355
# Include private variables
356
hash1 = DeepHash(person1, ignore_private_variables=False)
357
hash2 = DeepHash(person2, ignore_private_variables=False)
358
```
359
360
### Handling Complex Data Types
361
362
```python
363
import numpy as np
364
import pandas as pd
365
366
# Hash NumPy arrays
367
arr = np.array([1, 2, 3, 4, 5])
368
hash_arr = DeepHash(arr)
369
370
# Hash pandas DataFrames
371
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
372
hash_df = DeepHash(df)
373
374
# Hash with type exclusions
375
mixed_data = {
376
"numpy_array": arr,
377
"dataframe": df,
378
"regular_list": [1, 2, 3]
379
}
380
381
# Exclude NumPy types
382
hash_obj = DeepHash(mixed_data, exclude_types=[np.ndarray])
383
```
384
385
## Types
386
387
```python { .api }
388
# Hash function type
389
HashFunction = Callable[[Any], str]
390
391
# Common hash functions available
392
SHA256_HASHER: HashFunction
393
SHA1_HASHER: HashFunction
394
395
# DeepHash result access patterns
396
class DeepHashResult:
397
def __getitem__(self, key: str) -> str: ... # Access by path
398
def __eq__(self, other: 'DeepHashResult') -> bool: ... # Compare hashes
399
def keys(self) -> List[str]: ... # Get all hashed paths
400
def values(self) -> List[str]: ... # Get all hash values
401
402
# Type groups for equivalent hashing
403
NumericTypes = (int, float, complex)
404
StringTypes = (str, bytes, bytearray)
405
```