or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mddelta.mddifference.mdextract.mdhashing.mdindex.mdsearch.md

hashing.mddocs/

0

# Deep Hashing

1

2

Content-based hashing system that generates consistent hash values for Python objects regardless of key ordering or minor structural differences. DeepHash provides stable, reproducible hashes that can be used for caching, deduplication, and change detection across complex data structures.

3

4

## Capabilities

5

6

### Object Hashing

7

8

Generate deep hash values for any Python object with extensive customization options for handling different data types and structures.

9

10

```python { .api }

11

class DeepHash:

12

def __init__(

13

self,

14

obj: Any,

15

*,

16

apply_hash: bool = True,

17

custom_operators: Optional[List[Any]] = None,

18

default_timezone: Union[datetime.timezone, "BaseTzInfo"] = datetime.timezone.utc,

19

encodings: Optional[List[str]] = None,

20

exclude_obj_callback: Optional[Callable[[Any, str], bool]] = None,

21

exclude_paths: Optional[PathType] = None,

22

exclude_regex_paths: Optional[RegexType] = None,

23

exclude_types: Optional[Union[List[type], Set[type], Tuple[type, ...]]] = None,

24

hasher: Optional[Callable[[Union[str, bytes]], str]] = None,

25

hashes: Optional[Union[Dict[Any, Any], "DeepHash"]] = None,

26

ignore_encoding_errors: bool = False,

27

ignore_iterable_order: bool = True,

28

ignore_numeric_type_changes: bool = False,

29

ignore_private_variables: bool = True,

30

ignore_repetition: bool = True,

31

ignore_string_case: bool = False,

32

ignore_string_type_changes: bool = False,

33

ignore_type_in_groups: Any = None,

34

ignore_type_subclasses: bool = False,

35

ignore_uuid_types: bool = False,

36

include_paths: Optional[PathType] = None,

37

number_format_notation: str = "f",

38

number_to_string_func: Optional[NumberToStringFunc] = None,

39

parent: str = "root",

40

significant_digits: Optional[int] = None,

41

truncate_datetime: Optional[str] = None,

42

use_enum_value: bool = False,

43

**kwargs

44

):

45

"""

46

Deep hash of objects based on their content.

47

48

Parameters:

49

- obj: Object to hash

50

- hasher: Hash function to use (default: SHA256)

51

- ignore_type_in_groups: List of type groups to treat as equivalent for hashing

52

- ignore_encoding_errors: Ignore encoding errors when hashing strings

53

- ignore_numeric_type_changes: Ignore type differences between numeric types

54

- ignore_type_subclasses: Ignore type differences between subclasses

55

- ignore_string_type_changes: Ignore differences between string types

56

- exclude_types: List of types to exclude from hashing

57

- exclude_paths: List of paths to exclude from hashing

58

- exclude_regex_paths: List of regex patterns for paths to exclude

59

- significant_digits: Number of significant digits for float hashing

60

- apply_hash: Whether to apply the hash function to the result

61

- encodings: List of encodings to try for string hashing

62

- ignore_private_variables: Ignore private attributes (starting with _)

63

- parent: Parent DeepHash object for nested hashing

64

- ignore_repetition: Ignore repetitive elements when hashing

65

- number_format_notation: Notation for number formatting in hashes

66

- exclude_obj_callback: Callback function to exclude objects from hashing

67

- exclude_obj_callback_strict: Strict mode for exclude callback

68

- number_to_string_func: Custom function for number to string conversion

69

- ignore_nan_equality: Ignore NaN values when hashing

70

- math_epsilon: Epsilon for floating point hash comparisons

71

"""

72

```

73

74

### Hash Retrieval

75

76

Methods for retrieving hash values from hashed objects with optional path-based access.

77

78

```python { .api }

79

def get(self, path: str = None) -> str:

80

"""

81

Get hash value for the object or a specific path within it.

82

83

Parameters:

84

- path: Optional path to get hash for specific part of object

85

86

Returns:

87

Hash string for the specified object or path.

88

"""

89

90

def hexdigest(self) -> str:

91

"""

92

Get hexadecimal representation of the object's hash.

93

94

Returns:

95

Hexadecimal hash string.

96

"""

97

```

98

99

### Dictionary-like Access

100

101

DeepHash objects support dictionary-like access for retrieving hash values.

102

103

```python { .api }

104

def __getitem__(self, obj: Any, extract_index: Optional[int] = 0) -> Any:

105

"""

106

Get hash value for an object using dictionary-like access.

107

108

Parameters:

109

- obj: Object to get hash for

110

- extract_index: Index for extracting from the hash result

111

112

Returns:

113

Hash value for the object

114

"""

115

116

def __contains__(self, obj: Any) -> bool:

117

"""

118

Check if an object has been hashed.

119

120

Parameters:

121

- obj: Object to check

122

123

Returns:

124

True if object has been hashed, False otherwise

125

"""

126

127

def keys(self) -> Any:

128

"""

129

Get all keys (objects) that have been hashed.

130

131

Returns:

132

Keys from the hash result

133

"""

134

135

def values(self) -> Generator[Any, None, None]:

136

"""

137

Get all hash values.

138

139

Returns:

140

Generator yielding hash values

141

"""

142

143

def items(self) -> Generator[Tuple[Any, Any], None, None]:

144

"""

145

Get all (object, hash) pairs.

146

147

Returns:

148

Generator yielding (object, hash) tuples

149

"""

150

151

def __eq__(self, other: Any) -> bool:

152

"""

153

Compare two DeepHash objects for equality.

154

155

Parameters:

156

- other: Other DeepHash object to compare

157

158

Returns:

159

True if hashes are equal, False otherwise

160

"""

161

162

def __bool__(self) -> bool:

163

"""

164

Check if the DeepHash object contains any hashes.

165

166

Returns:

167

True if hashes exist, False otherwise

168

"""

169

```

170

171

### Static Hash Functions

172

173

Built-in static hash functions available for use with DeepHash.

174

175

```python { .api }

176

@staticmethod

177

def sha256hex(obj: Union[str, bytes]) -> str:

178

"""

179

Generate SHA256 hash in hexadecimal format.

180

181

Parameters:

182

- obj: Data to hash

183

184

Returns:

185

SHA256 hash as hexadecimal string.

186

"""

187

188

@staticmethod

189

def sha1hex(obj: Union[str, bytes]) -> str:

190

"""

191

Generate SHA1 hash in hexadecimal format.

192

193

Parameters:

194

- obj: Data to hash

195

196

Returns:

197

SHA1 hash as hexadecimal string.

198

"""

199

200

@staticmethod

201

def _getitem(hashes: Dict[Any, Any], obj: Any, extract_index: Optional[int] = 0, use_enum_value: bool = False) -> Any:

202

"""

203

Static method to get item from hashes dictionary.

204

205

Parameters:

206

- hashes: Dictionary of hashes

207

- obj: Object to get hash for

208

- extract_index: Index for extraction

209

- use_enum_value: Whether to use enum values

210

211

Returns:

212

Hash value for the object

213

"""

214

215

@staticmethod

216

def get_key(hashes: Dict[Any, Any], key: Any, default: Any = None, extract_index: Optional[int] = 0, use_enum_value: bool = False) -> Any:

217

"""

218

Static method to get key from hashes dictionary with default.

219

220

Parameters:

221

- hashes: Dictionary of hashes

222

- key: Key to get hash for

223

- default: Default value if key not found

224

- extract_index: Index for extraction

225

- use_enum_value: Whether to use enum values

226

227

Returns:

228

Hash value for the key or default value

229

"""

230

```

231

232

## Usage Examples

233

234

### Basic Hashing

235

236

```python

237

from deepdiff import DeepHash

238

239

# Hash simple objects

240

obj1 = {"name": "John", "age": 30}

241

obj2 = {"age": 30, "name": "John"} # Same content, different order

242

243

hash1 = DeepHash(obj1)

244

hash2 = DeepHash(obj2)

245

246

print(hash1 == hash2) # True - same content produces same hash

247

print(hash1.hexdigest()) # Get hex representation

248

```

249

250

### Ignoring Type Differences

251

252

```python

253

# Hash with type normalization

254

data1 = {"value": 42}

255

data2 = {"value": 42.0}

256

257

# Without type normalization - different hashes

258

hash1 = DeepHash(data1)

259

hash2 = DeepHash(data2)

260

print(hash1 == hash2) # False

261

262

# With type normalization - same hashes

263

hash1 = DeepHash(data1, ignore_numeric_type_changes=True)

264

hash2 = DeepHash(data2, ignore_numeric_type_changes=True)

265

print(hash1 == hash2) # True

266

```

267

268

### Custom Hash Functions

269

270

```python

271

import hashlib

272

273

# Use custom hash function

274

def custom_hasher(data):

275

return hashlib.md5(str(data).encode()).hexdigest()

276

277

hash_obj = DeepHash(data, hasher=custom_hasher)

278

print(hash_obj.hexdigest())

279

```

280

281

### Excluding Paths

282

283

```python

284

# Hash while excluding certain paths

285

data = {

286

"user_id": 123,

287

"name": "John",

288

"timestamp": "2023-01-01T00:00:00Z",

289

"metadata": {"created_by": "system"}

290

}

291

292

# Exclude timestamp for stable hashing

293

hash_obj = DeepHash(

294

data,

295

exclude_paths=["root['timestamp']", "root['metadata']['created_by']"]

296

)

297

```

298

299

### Precision Control

300

301

```python

302

# Control floating point precision

303

data = {"pi": 3.141592653589793}

304

305

# Hash with different precision levels

306

hash1 = DeepHash(data, significant_digits=2) # 3.14

307

hash2 = DeepHash(data, significant_digits=4) # 3.142

308

309

print(hash1.hexdigest())

310

print(hash2.hexdigest())

311

```

312

313

### Path-Based Hash Retrieval

314

315

```python

316

nested_data = {

317

"level1": {

318

"level2": {

319

"data": [1, 2, 3],

320

"other": "value"

321

}

322

}

323

}

324

325

hash_obj = DeepHash(nested_data)

326

327

# Get hash for entire object

328

full_hash = hash_obj.get()

329

330

# Get hash for specific path

331

partial_hash = hash_obj.get("root['level1']['level2']['data']")

332

333

print(f"Full object hash: {full_hash}")

334

print(f"Array hash: {partial_hash}")

335

```

336

337

### Working with Custom Objects

338

339

```python

340

class Person:

341

def __init__(self, name, age):

342

self.name = name

343

self.age = age

344

self._private = "secret"

345

346

person1 = Person("John", 30)

347

person2 = Person("John", 30)

348

349

# Hash custom objects

350

hash1 = DeepHash(person1)

351

hash2 = DeepHash(person2)

352

353

print(hash1 == hash2) # True - same attribute values

354

355

# Include private variables

356

hash1 = DeepHash(person1, ignore_private_variables=False)

357

hash2 = DeepHash(person2, ignore_private_variables=False)

358

```

359

360

### Handling Complex Data Types

361

362

```python

363

import numpy as np

364

import pandas as pd

365

366

# Hash NumPy arrays

367

arr = np.array([1, 2, 3, 4, 5])

368

hash_arr = DeepHash(arr)

369

370

# Hash pandas DataFrames

371

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

372

hash_df = DeepHash(df)

373

374

# Hash with type exclusions

375

mixed_data = {

376

"numpy_array": arr,

377

"dataframe": df,

378

"regular_list": [1, 2, 3]

379

}

380

381

# Exclude NumPy types

382

hash_obj = DeepHash(mixed_data, exclude_types=[np.ndarray])

383

```

384

385

## Types

386

387

```python { .api }

388

# Hash function type

389

HashFunction = Callable[[Any], str]

390

391

# Common hash functions available

392

SHA256_HASHER: HashFunction

393

SHA1_HASHER: HashFunction

394

395

# DeepHash result access patterns

396

class DeepHashResult:

397

def __getitem__(self, key: str) -> str: ... # Access by path

398

def __eq__(self, other: 'DeepHashResult') -> bool: ... # Compare hashes

399

def keys(self) -> List[str]: ... # Get all hashed paths

400

def values(self) -> List[str]: ... # Get all hash values

401

402

# Type groups for equivalent hashing

403

NumericTypes = (int, float, complex)

404

StringTypes = (str, bytes, bytearray)

405

```