or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bloom-filters.mdfrequency-counters.mdhash-maps.mdindex.md

index.mddocs/

0

# Preshed

1

2

A high-performance Cython-based hash table library for Python that assumes keys come pre-hashed. Inspired by Jeff Preshing's hash table design, preshed provides efficient data structures optimized for use cases where keys are already randomized, such as natural language processing pipelines, caching systems, and other performance-critical applications.

3

4

## Package Information

5

6

- **Package Name**: preshed

7

- **Language**: Python (with Cython extensions for performance)

8

- **Dependencies**: `cymem>=2.0.2,<2.1.0`, `murmurhash>=0.28.0,<1.1.0`

9

- **Installation**: `pip install preshed`

10

11

## Core Imports

12

13

```python

14

from preshed.maps import PreshMap

15

from preshed.counter import PreshCounter

16

from preshed.bloom import BloomFilter, calculate_size_and_hash_count

17

```

18

19

Package metadata:

20

21

```python

22

from preshed.about import __version__, __author__, __license__

23

print(__version__) # "4.0.0"

24

print(__author__) # "Explosion"

25

print(__license__) # "MIT"

26

27

# Or access via main module

28

import preshed

29

print(preshed.__version__) # "4.0.0"

30

```

31

32

## Basic Usage

33

34

```python

35

from preshed.maps import PreshMap

36

from preshed.counter import PreshCounter

37

from preshed.bloom import BloomFilter

38

39

# Hash map for pre-hashed keys

40

hash_map = PreshMap()

41

hash_map[12345] = 67890

42

print(hash_map[12345]) # 67890

43

44

# Counter with frequency tracking

45

counter = PreshCounter()

46

counter.inc(42, 5) # Increment key 42 by 5

47

print(counter[42]) # 5

48

print(counter.total) # 5

49

50

# Bloom filter for membership testing

51

bloom = BloomFilter(size=1024, hash_funcs=3)

52

bloom.add(12345)

53

print(12345 in bloom) # True

54

print(54321 in bloom) # False (probably)

55

```

56

57

## Architecture

58

59

Preshed provides three main high-performance data structures:

60

61

- **PreshMap**: Hash table with open addressing and linear probing, optimized for pre-hashed 64-bit keys

62

- **PreshCounter**: Frequency counter with optional Good-Turing smoothing for probability estimation

63

- **BloomFilter**: Space-efficient probabilistic membership testing with configurable error rates

64

65

All data structures are implemented in Cython for maximum performance and designed to work with pre-hashed keys to minimize hash computation overhead. The library compiles to native C extensions, providing near-C performance while maintaining a Python interface.

66

67

## Capabilities

68

69

### Hash Maps

70

71

High-performance hash table that maps pre-hashed uint64_t keys to uint64_t values using open addressing with linear probing. Provides dictionary-like interface with O(1) average-case performance.

72

73

```python { .api }

74

class PreshMap:

75

def __init__(self, initial_size: int = 8): ...

76

def __getitem__(self, key: int) -> int | None: ... # Returns None if key not found

77

def __setitem__(self, key: int, value: int) -> None: ...

78

def __delitem__(self, key: int) -> None: ... # Raises KeyError if key not found

79

def __len__(self) -> int: ...

80

def __contains__(self, key: int) -> bool: ...

81

def pop(self, key: int, default=None) -> int | None: ...

82

def items(self) -> Iterator[tuple[int, int]]: ...

83

def keys(self) -> Iterator[int]: ...

84

def values(self) -> Iterator[int]: ...

85

@property

86

def capacity(self) -> int: ...

87

```

88

89

[Hash Maps](./hash-maps.md)

90

91

### Frequency Counters

92

93

Efficient frequency counting for uint64-valued keys with optional Good-Turing smoothing for probability estimation. Supports statistical operations and probability calculations.

94

95

```python { .api }

96

class PreshCounter:

97

def __init__(self, initial_size: int = 8): ...

98

def __getitem__(self, key: int) -> int: ...

99

def __len__(self) -> int: ...

100

def __iter__(self) -> Iterator[tuple[int, int]]: ... # Yields (key, count) pairs

101

def inc(self, key: int, inc: int) -> int: ...

102

def prob(self, key: int) -> float: ...

103

def smooth(self) -> None: ...

104

@property

105

def total(self) -> int: ...

106

@property

107

def length(self) -> int: ...

108

```

109

110

[Frequency Counters](./frequency-counters.md)

111

112

### Bloom Filters

113

114

Space-efficient probabilistic data structure for membership testing. Provides configurable error rates and serialization capabilities with no false negatives but possible false positives.

115

116

```python { .api }

117

class BloomFilter:

118

def __init__(self, size: int = 1024, hash_funcs: int = 23, seed: int = 0): ...

119

@classmethod

120

def from_error_rate(cls, members: int, error_rate: float = 1e-4): ...

121

def add(self, item: int) -> None: ...

122

def __contains__(self, item: int) -> bool: ...

123

def to_bytes(self) -> bytes: ...

124

def from_bytes(self, byte_string: bytes): ...

125

126

def calculate_size_and_hash_count(members: int, error_rate: float) -> tuple[int, int]: ...

127

```

128

129

[Bloom Filters](./bloom-filters.md)

130

131

## Types

132

133

```python { .api }

134

# All keys are expected to be 64-bit unsigned integers (pre-hashed)

135

Key = int # uint64_t

136

Value = int # uint64_t for maps, int64_t for counters

137

Count = int # int64_t

138

```