or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

hashers.mdindex.mdsimple-functions.md

index.mddocs/

0

# mmh3

1

2

mmh3 is a Python extension providing MurmurHash3 hash functions, a family of fast and robust non-cryptographic hash functions. It offers comprehensive hashing capabilities including 32-bit, 64-bit, and 128-bit hash functions with both signed and unsigned outputs, supporting architecture-specific optimizations for x86 and x64 platforms.

3

4

## Package Information

5

6

- **Package Name**: mmh3

7

- **Language**: Python

8

- **Installation**: `pip install mmh3`

9

10

## Core Imports

11

12

```python

13

import mmh3

14

```

15

16

All functions and classes are available directly from the mmh3 module.

17

18

## Basic Usage

19

20

```python

21

import mmh3

22

23

# Basic 32-bit hashing

24

hash_value = mmh3.hash("foo") # -156908512

25

hash_with_seed = mmh3.hash("foo", seed=42) # -1322301282

26

unsigned_hash = mmh3.hash("foo", signed=False) # 4138058784

27

28

# 64-bit hashing (returns tuple of two 64-bit integers)

29

hash64_result = mmh3.hash64("foo") # (-2129773440516405919, 9128664383759220103)

30

31

# 128-bit hashing

32

hash128_result = mmh3.hash128("foo", seed=42) # 215966891540331383248189432718888555506

33

34

# Hash as bytes

35

hash_bytes = mmh3.hash_bytes("foo") # b'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~'

36

37

# Streaming hasher for large data

38

hasher = mmh3.mmh3_32(seed=42)

39

hasher.update(b"foo")

40

hasher.update(b"bar")

41

digest = hasher.digest() # bytes

42

sint_digest = hasher.sintdigest() # signed int

43

uint_digest = hasher.uintdigest() # unsigned int

44

```

45

46

## Architecture

47

48

mmh3 provides two complementary interfaces:

49

50

- **Simple Functions**: Direct hash computation for immediate results

51

- **Hasher Classes**: Streaming interface for incremental hashing of large datasets

52

53

The library implements MurmurHash3 algorithms with architecture-specific optimizations:

54

- **x64 optimization**: For 64-bit architectures (default)

55

- **x86 optimization**: For 32-bit architectures

56

57

All hash functions support configurable seeds and signed/unsigned output options.

58

59

## Capabilities

60

61

### Simple Hash Functions

62

63

Direct hash computation functions for immediate results with various output formats and architecture optimizations.

64

65

```python { .api }

66

def hash(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...

67

def hash_from_buffer(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...

68

def hash64(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = True) -> tuple[int, int]: ...

69

def hash128(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = False) -> int: ...

70

def hash_bytes(key: StrHashable, seed: int = 0, x64arch: bool = True) -> bytes: ...

71

```

72

73

[Simple Hash Functions](./simple-functions.md)

74

75

### Streaming Hashers

76

77

hashlib-compatible hasher classes for incremental hashing of large datasets and streaming operations.

78

79

```python { .api }

80

class Hasher:

81

def __init__(self, seed: int = 0) -> None: ...

82

def update(self, input: Hashable) -> None: ...

83

def digest(self) -> bytes: ...

84

def sintdigest(self) -> int: ...

85

def uintdigest(self) -> int: ...

86

def copy(self) -> Hasher: ...

87

@property

88

def digest_size(self) -> int: ...

89

@property

90

def block_size(self) -> int: ...

91

@property

92

def name(self) -> str: ...

93

94

class mmh3_32(Hasher): ...

95

96

class mmh3_x64_128(Hasher):

97

def stupledigest(self) -> tuple[int, int]: ...

98

def utupledigest(self) -> tuple[int, int]: ...

99

100

class mmh3_x86_128(Hasher):

101

def stupledigest(self) -> tuple[int, int]: ...

102

def utupledigest(self) -> tuple[int, int]: ...

103

```

104

105

[Streaming Hashers](./hashers.md)

106

107

## Types

108

109

```python { .api }

110

from typing import Protocol, Union

111

112

class IntArrayLike(Protocol):

113

def __getitem__(self, index) -> int: ...

114

115

Hashable = Union[bytes, bytearray, memoryview, IntArrayLike]

116

StrHashable = Union[str, Hashable]

117

```

118

119

## Common Use Cases

120

121

- **Data Mining & Machine Learning**: Feature hashing and dimensionality reduction

122

- **Bloom Filters**: Fast set membership testing with probabilistic data structures

123

- **MinHash Algorithms**: Document similarity and near-duplicate detection

124

- **Natural Language Processing**: Text fingerprinting and similarity matching

125

- **IoT Security Research**: Shodan favicon hash calculations

126

- **Distributed Systems**: Consistent hashing and data partitioning

127

- **Caching**: Cache key generation with collision resistance