Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
npx @tessl/cli install tessl/pypi-mmh3@4.1.00
# mmh3
1
2
mmh3 is a Python extension providing MurmurHash3 hash functions, a family of fast and robust non-cryptographic hash functions. It offers comprehensive hashing capabilities including 32-bit, 64-bit, and 128-bit hash functions with both signed and unsigned outputs, supporting architecture-specific optimizations for x86 and x64 platforms.
3
4
## Package Information
5
6
- **Package Name**: mmh3
7
- **Language**: Python
8
- **Installation**: `pip install mmh3`
9
10
## Core Imports
11
12
```python
13
import mmh3
14
```
15
16
All functions and classes are available directly from the mmh3 module.
17
18
## Basic Usage
19
20
```python
21
import mmh3
22
23
# Basic 32-bit hashing
24
hash_value = mmh3.hash("foo") # -156908512
25
hash_with_seed = mmh3.hash("foo", seed=42) # -1322301282
26
unsigned_hash = mmh3.hash("foo", signed=False) # 4138058784
27
28
# 64-bit hashing (returns tuple of two 64-bit integers)
29
hash64_result = mmh3.hash64("foo") # (-2129773440516405919, 9128664383759220103)
30
31
# 128-bit hashing
32
hash128_result = mmh3.hash128("foo", seed=42) # 215966891540331383248189432718888555506
33
34
# Hash as bytes
35
hash_bytes = mmh3.hash_bytes("foo") # b'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~'
36
37
# Streaming hasher for large data
38
hasher = mmh3.mmh3_32(seed=42)
39
hasher.update(b"foo")
40
hasher.update(b"bar")
41
digest = hasher.digest() # bytes
42
sint_digest = hasher.sintdigest() # signed int
43
uint_digest = hasher.uintdigest() # unsigned int
44
```
45
46
## Architecture
47
48
mmh3 provides two complementary interfaces:
49
50
- **Simple Functions**: Direct hash computation for immediate results
51
- **Hasher Classes**: Streaming interface for incremental hashing of large datasets
52
53
The library implements MurmurHash3 algorithms with architecture-specific optimizations:
54
- **x64 optimization**: For 64-bit architectures (default)
55
- **x86 optimization**: For 32-bit architectures
56
57
All hash functions support configurable seeds and signed/unsigned output options.
58
59
## Capabilities
60
61
### Simple Hash Functions
62
63
Direct hash computation functions for immediate results with various output formats and architecture optimizations.
64
65
```python { .api }
66
def hash(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...
67
def hash_from_buffer(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...
68
def hash64(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = True) -> tuple[int, int]: ...
69
def hash128(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = False) -> int: ...
70
def hash_bytes(key: StrHashable, seed: int = 0, x64arch: bool = True) -> bytes: ...
71
```
72
73
[Simple Hash Functions](./simple-functions.md)
74
75
### Streaming Hashers
76
77
hashlib-compatible hasher classes for incremental hashing of large datasets and streaming operations.
78
79
```python { .api }
80
class Hasher:
81
def __init__(self, seed: int = 0) -> None: ...
82
def update(self, input: Hashable) -> None: ...
83
def digest(self) -> bytes: ...
84
def sintdigest(self) -> int: ...
85
def uintdigest(self) -> int: ...
86
def copy(self) -> Hasher: ...
87
@property
88
def digest_size(self) -> int: ...
89
@property
90
def block_size(self) -> int: ...
91
@property
92
def name(self) -> str: ...
93
94
class mmh3_32(Hasher): ...
95
96
class mmh3_x64_128(Hasher):
97
def stupledigest(self) -> tuple[int, int]: ...
98
def utupledigest(self) -> tuple[int, int]: ...
99
100
class mmh3_x86_128(Hasher):
101
def stupledigest(self) -> tuple[int, int]: ...
102
def utupledigest(self) -> tuple[int, int]: ...
103
```
104
105
[Streaming Hashers](./hashers.md)
106
107
## Types
108
109
```python { .api }
110
from typing import Protocol, Union
111
112
class IntArrayLike(Protocol):
113
def __getitem__(self, index) -> int: ...
114
115
Hashable = Union[bytes, bytearray, memoryview, IntArrayLike]
116
StrHashable = Union[str, Hashable]
117
```
118
119
## Common Use Cases
120
121
- **Data Mining & Machine Learning**: Feature hashing and dimensionality reduction
122
- **Bloom Filters**: Fast set membership testing with probabilistic data structures
123
- **MinHash Algorithms**: Document similarity and near-duplicate detection
124
- **Natural Language Processing**: Text fingerprinting and similarity matching
125
- **IoT Security Research**: Shodan favicon hash calculations
126
- **Distributed Systems**: Consistent hashing and data partitioning
127
- **Caching**: Cache key generation with collision resistance