Tessl Tile for pypi/mmh3@4.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

hashers.md index.md simple-functions.md

hashers.mddocs/

0
# Streaming Hashers
1

2
hashlib-compatible hasher classes for incremental hashing of large datasets and streaming operations. These classes allow you to hash data incrementally, making them ideal for processing large files, network streams, or when you don't have all data available at once.
3

4
## Capabilities
5

6
### Base Hasher Interface
7

8
Abstract base class defining the common interface for all streaming hashers.
9

10
```python { .api }
11
class Hasher:
12
    """Base class for streaming MurmurHash3 hashers."""
13
    
14
    def __init__(self, seed: int = 0) -> None:
15
        """
16
        Initialize hasher with optional seed.
17
        
18
        Args:
19
            seed: Seed value for hash computation (default: 0)
20
        """
21
    
22
    def update(self, input: Hashable) -> None:
23
        """
24
        Update hasher with new data.
25
        
26
        Args:
27
            input: Data to add to hash (bytes, bytearray, memoryview, or array-like)
28
            
29
        Raises:
30
            TypeError: If input is a string (strings must be encoded first)
31
        """
32
    
33
    def digest(self) -> bytes:
34
        """
35
        Get hash digest as bytes.
36
        
37
        Returns:
38
            Hash digest as bytes
39
        """
40
    
41
    def sintdigest(self) -> int:
42
        """
43
        Get hash as signed integer.
44
        
45
        Returns:
46
            Hash value as signed integer
47
        """
48
    
49
    def uintdigest(self) -> int:
50
        """
51
        Get hash as unsigned integer.
52
        
53
        Returns:
54
            Hash value as unsigned integer
55
        """
56
    
57
    def copy(self) -> Hasher:
58
        """
59
        Create a copy of the hasher's current state.
60
        
61
        Returns:
62
            New hasher instance with identical state
63
        """
64
    
65
    @property
66
    def digest_size(self) -> int:
67
        """
68
        Get digest size in bytes.
69
        
70
        Returns:
71
            Number of bytes in digest output
72
        """
73
    
74
    @property
75
    def block_size(self) -> int:
76
        """
77
        Get internal block size in bytes.
78
        
79
        Returns:
80
            Number of bytes processed in each internal block
81
        """
82
    
83
    @property
84
    def name(self) -> str:
85
        """
86
        Get hasher algorithm name.
87
        
88
        Returns:
89
            String identifying the hash algorithm
90
        """
91
```
92

93
### 32-bit Hasher
94

95
Streaming hasher for 32-bit MurmurHash3 computation.
96

97
```python { .api }
98
class mmh3_32(Hasher):
99
    """32-bit MurmurHash3 streaming hasher."""
100
```
101

102
Properties:
103
- `digest_size`: 4 bytes
104
- `block_size`: 12 bytes  
105
- `name`: "mmh3_32"
106

107
Example usage:
108
```python
109
import mmh3
110

111
# Basic streaming hashing
112
hasher = mmh3.mmh3_32()
113
hasher.update(b"foo")
114
hasher.update(b"bar")
115

116
# Get results in different formats
117
digest = hasher.digest()        # b'\x8d\x8f\xe7\xfd'
118
signed_int = hasher.sintdigest()    # -156908512
119
unsigned_int = hasher.uintdigest()  # 4138058784
120

121
# With custom seed
122
hasher = mmh3.mmh3_32(seed=42)
123
hasher.update(b"Hello, world!")
124
result = hasher.sintdigest()  # Hash with seed
125

126
# Copy hasher state
127
hasher1 = mmh3.mmh3_32()
128
hasher1.update(b"partial data")
129
hasher2 = hasher1.copy()  # Copy current state
130
hasher2.update(b" more data")  # Continue from copy
131
```
132

133
### 128-bit x64 Hasher
134

135
Streaming hasher for 128-bit MurmurHash3 optimized for x64 architectures.
136

137
```python { .api }
138
class mmh3_x64_128(Hasher):
139
    """128-bit MurmurHash3 streaming hasher optimized for x64 architectures."""
140
    
141
    def stupledigest(self) -> tuple[int, int]:
142
        """
143
        Get hash as tuple of two signed 64-bit integers.
144
        
145
        Returns:
146
            Tuple of two signed 64-bit integers representing the 128-bit hash
147
        """
148
    
149
    def utupledigest(self) -> tuple[int, int]:
150
        """
151
        Get hash as tuple of two unsigned 64-bit integers.
152
        
153
        Returns:
154
            Tuple of two unsigned 64-bit integers representing the 128-bit hash
155
        """
156
```
157

158
Properties:
159
- `digest_size`: 16 bytes
160
- `block_size`: 32 bytes
161
- `name`: "mmh3_x64_128"
162

163
Example usage:
164
```python
165
import mmh3
166

167
# 128-bit streaming hashing (x64 optimized)
168
hasher = mmh3.mmh3_x64_128(seed=42)
169
hasher.update(b"foo")
170
hasher.update(b"bar")
171

172
# Get results in various formats
173
digest = hasher.digest()  # 16-byte digest
174
signed_int = hasher.sintdigest()  # 128-bit signed integer
175
unsigned_int = hasher.uintdigest()  # 128-bit unsigned integer
176

177
# Get as tuple of 64-bit integers
178
signed_tuple = hasher.stupledigest()  # (7689522670935629698, -159584473158936081)
179
unsigned_tuple = hasher.utupledigest()  # (7689522670935629698, 18287159600550615535)
180

181
# Process large streaming data
182
hasher = mmh3.mmh3_x64_128()
183
for chunk in data_stream:
184
    hasher.update(chunk)
185
final_hash = hasher.digest()
186
```
187

188
### 128-bit x86 Hasher
189

190
Streaming hasher for 128-bit MurmurHash3 optimized for x86 architectures.
191

192
```python { .api }
193
class mmh3_x86_128(Hasher):
194
    """128-bit MurmurHash3 streaming hasher optimized for x86 architectures."""
195
    
196
    def stupledigest(self) -> tuple[int, int]:
197
        """
198
        Get hash as tuple of two signed 64-bit integers.
199
        
200
        Returns:
201
            Tuple of two signed 64-bit integers representing the 128-bit hash
202
        """
203
    
204
    def utupledigest(self) -> tuple[int, int]:
205
        """
206
        Get hash as tuple of two unsigned 64-bit integers.
207
        
208
        Returns:
209
            Tuple of two unsigned 64-bit integers representing the 128-bit hash
210
        """
211
```
212

213
Properties:
214
- `digest_size`: 16 bytes
215
- `block_size`: 32 bytes
216
- `name`: "mmh3_x86_128"
217

218
Example usage:
219
```python
220
import mmh3
221

222
# 128-bit streaming hashing (x86 optimized)
223
hasher = mmh3.mmh3_x86_128(seed=123)
224
hasher.update(b"data chunk 1")
225
hasher.update(b"data chunk 2")
226

227
# Get results (same interface as x64 version)
228
digest = hasher.digest()
229
signed_tuple = hasher.stupledigest()
230
unsigned_tuple = hasher.utupledigest()
231

232
# Architecture-specific optimization
233
# Use mmh3_x86_128 on 32-bit systems for better performance
234
# Use mmh3_x64_128 on 64-bit systems for better performance
235
```
236

237
## Hasher Usage Patterns
238

239
### Incremental File Hashing
240

241
```python
242
import mmh3
243

244
def hash_large_file(filename):
245
    hasher = mmh3.mmh3_x64_128()
246
    with open(filename, 'rb') as f:
247
        while chunk := f.read(8192):  # 8KB chunks
248
            hasher.update(chunk)
249
    return hasher.digest()
250

251
file_hash = hash_large_file('large_dataset.bin')
252
```
253

254
### Stream Processing
255

256
```python
257
import mmh3
258

259
def process_data_stream(data_stream):
260
    hasher = mmh3.mmh3_32(seed=42)
261
    for data_chunk in data_stream:
262
        # Process data
263
        processed = process_chunk(data_chunk)
264
        # Update hash incrementally
265
        hasher.update(processed)
266
    return hasher.uintdigest()
267
```
268

269
### State Copying for Parallel Processing
270

271
```python
272
import mmh3
273

274
# Process common prefix once
275
base_hasher = mmh3.mmh3_x64_128()
276
base_hasher.update(b"common prefix data")
277

278
# Branch processing from common state
279
hasher1 = base_hasher.copy()
280
hasher1.update(b"branch 1 data")
281
result1 = hasher1.digest()
282

283
hasher2 = base_hasher.copy()
284
hasher2.update(b"branch 2 data")
285
result2 = hasher2.digest()
286
```
287

288
## Error Handling and Constraints
289

290
### Input Type Restrictions
291

292
Hashers only accept binary data types:
293
- ✅ `bytes`, `bytearray`, `memoryview`, array-like objects
294
- ❌ `str` - strings must be encoded first
295

296
```python
297
import mmh3
298

299
hasher = mmh3.mmh3_32()
300

301
# Correct usage
302
hasher.update(b"binary data")
303
hasher.update("text data".encode('utf-8'))  # Encode strings first
304

305
# This will raise TypeError
306
# hasher.update("raw string")  # TypeError: Strings must be encoded before hashing
307
```
308

309
### Thread Safety
310

311
All hasher instances are independent and thread-safe. However, individual hasher objects should not be shared between threads without proper synchronization.
312

313
### Performance Considerations
314

315
- **x64 vs x86**: Choose the appropriate hasher for your target architecture
316
- **Memory efficiency**: Hashers maintain minimal internal state
317
- **Chunk size**: Optimal chunk sizes are typically 4KB-64KB for file processing
318
- **State copying**: `copy()` is lightweight and creates minimal overhead
319

320
### Hasher Lifecycle
321

322
1. **Initialize**: Create hasher with optional seed
323
2. **Update**: Add data incrementally with `update()`
324
3. **Finalize**: Get results with `digest()`, `sintdigest()`, `uintdigest()`
325
4. **Copy**: Branch processing with `copy()` if needed
326
5. **Reuse**: Hashers can continue to be updated after getting results

Version

Tile

Files

hashers.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

hashers.mddocs/