or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bloom-filters.mdfrequency-counters.mdhash-maps.mdindex.md

frequency-counters.mddocs/

0

# Frequency Counters

1

2

Efficient frequency counting for uint64-valued keys with optional Good-Turing smoothing for probability estimation. Designed for statistical applications and natural language processing tasks.

3

4

## Capabilities

5

6

### PreshCounter Class

7

8

Frequency counter with statistical operations and probability estimation capabilities.

9

10

```python { .api }

11

class PreshCounter:

12

"""

13

Count occurrences of uint64-valued keys with optional smoothing.

14

"""

15

def __init__(self, initial_size: int = 8):

16

"""

17

Create a new frequency counter.

18

19

Parameters:

20

- initial_size: Initial capacity (must be power of 2, defaults to 8)

21

"""

22

23

def __getitem__(self, key: int) -> int:

24

"""

25

Get count for key.

26

27

Parameters:

28

- key: 64-bit unsigned integer key

29

30

Returns:

31

- Count for key (0 if key not seen)

32

"""

33

34

def __len__(self) -> int:

35

"""

36

Get capacity of underlying map.

37

38

Returns:

39

- Capacity of counter

40

"""

41

42

def __iter__(self) -> Iterator[tuple[int, int]]:

43

"""

44

Iterate over (key, count) pairs.

45

46

Returns:

47

- Iterator yielding (key, count) tuples

48

"""

49

```

50

51

### Counting Operations

52

53

Core counting functionality for frequency tracking.

54

55

```python { .api }

56

def inc(self, key: int, inc: int) -> int:

57

"""

58

Increment counter for key.

59

60

Parameters:

61

- key: 64-bit unsigned integer key

62

- inc: Amount to increment (can be negative)

63

64

Returns:

65

- New count for key

66

67

Raises:

68

- Exception: On error

69

"""

70

```

71

72

### Statistical Operations

73

74

Probability estimation and smoothing operations.

75

76

```python { .api }

77

def prob(self, key: int) -> float:

78

"""

79

Get probability for key.

80

81

Parameters:

82

- key: 64-bit unsigned integer key

83

84

Returns:

85

- Probability (count/total) or smoothed probability if smoother is set

86

"""

87

88

def smooth(self) -> None:

89

"""

90

Apply Good-Turing smoothing to the frequency distribution.

91

Creates a GaleSmoother instance for probability estimation.

92

93

Raises:

94

- AssertionError: If data cannot be smoothed (need items with count 1 and 2)

95

"""

96

```

97

98

### Properties

99

100

```python { .api }

101

@property

102

def total(self) -> int:

103

"""

104

Get total count across all keys.

105

106

Returns:

107

- Sum of all counts

108

"""

109

110

@property

111

def length(self) -> int:

112

"""

113

Get capacity of underlying map.

114

115

Returns:

116

- Capacity of counter

117

"""

118

119

@property

120

def smoother(self) -> GaleSmoother | None:

121

"""

122

Get smoother instance (None until smooth() is called).

123

124

Returns:

125

- GaleSmoother instance or None

126

"""

127

```

128

129

### GaleSmoother Class

130

131

Good-Turing smoother for probability estimation (created by smooth() method).

132

133

```python { .api }

134

class GaleSmoother:

135

"""

136

Good-Turing smoother for frequency distributions.

137

"""

138

def __call__(self, count: int) -> float:

139

"""

140

Get smoothed count for raw count value.

141

142

Parameters:

143

- count: Raw frequency count

144

145

Returns:

146

- Smoothed count value

147

"""

148

149

@property

150

def cutoff(self) -> int:

151

"""

152

Get smoothing cutoff value.

153

154

Returns:

155

- Cutoff value for smoothing

156

"""

157

158

@property

159

def total(self) -> float:

160

"""

161

Get total smoothed count.

162

163

Returns:

164

- Total of all smoothed counts

165

"""

166

167

def count_count(self, r: int) -> int:

168

"""

169

Get count of items with specific frequency.

170

171

Parameters:

172

- r: Frequency value to query

173

174

Returns:

175

- Number of items that occur exactly r times

176

"""

177

```

178

179

## Usage Examples

180

181

### Basic Counting

182

183

```python

184

from preshed.counter import PreshCounter

185

186

# Create counter

187

counter = PreshCounter()

188

189

# Count occurrences

190

counter.inc(42, 1) # Increment key 42 by 1

191

counter.inc(42, 4) # Increment key 42 by 4 more

192

counter.inc(100, 10) # Increment key 100 by 10

193

194

print(counter[42]) # 5

195

print(counter[100]) # 10

196

print(counter[999]) # 0 (never seen)

197

198

print(counter.total) # 15

199

```

200

201

### Probability Estimation

202

203

```python

204

from preshed.counter import PreshCounter

205

206

counter = PreshCounter()

207

208

# Add some data

209

counter.inc(1, 10) # 10 occurrences

210

counter.inc(2, 20) # 20 occurrences

211

counter.inc(3, 5) # 5 occurrences

212

213

# Get raw probabilities

214

print(counter.prob(1)) # 10/35 ≈ 0.286

215

print(counter.prob(2)) # 20/35 ≈ 0.571

216

print(counter.prob(3)) # 5/35 ≈ 0.143

217

print(counter.prob(999)) # 0.0 (unseen)

218

```

219

220

### Good-Turing Smoothing

221

222

```python

223

from preshed.counter import PreshCounter

224

225

# Create frequency distribution

226

counter = PreshCounter()

227

228

# Add data (need items with count 1 and 2 for smoothing)

229

for i in range(10):

230

counter.inc(100 + i, 1) # 10 items with frequency 1

231

232

for i in range(6):

233

counter.inc(200 + i, 2) # 6 items with frequency 2

234

235

for i in range(4):

236

counter.inc(300 + i, 3) # 4 items with frequency 3

237

238

print(f"Total before smoothing: {counter.total}")

239

240

# Apply smoothing

241

counter.smooth()

242

243

# Now probabilities are smoothed

244

print(counter.prob(100)) # Smoothed probability for freq-1 item

245

print(counter.prob(999)) # Non-zero probability for unseen item

246

247

# Access smoother directly

248

smoother = counter.smoother

249

print(smoother(1)) # Smoothed count for frequency 1

250

print(smoother.total) # Total smoothed count

251

```

252

253

### Iteration

254

255

```python

256

from preshed.counter import PreshCounter

257

258

counter = PreshCounter()

259

counter.inc(1, 5)

260

counter.inc(2, 3)

261

counter.inc(3, 8)

262

263

# Iterate over (key, count) pairs

264

for key, count in counter:

265

print(f"Key {key}: {count} occurrences")

266

267

# Sort by frequency

268

sorted_items = sorted(counter, key=lambda x: x[1], reverse=True)

269

for key, count in sorted_items:

270

print(f"Key {key}: {count} occurrences")

271

```

272

273

### Statistical Analysis

274

275

```python

276

from preshed.counter import PreshCounter

277

278

# Build frequency distribution

279

counter = PreshCounter()

280

281

# Add data points

282

data = [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5]

283

for item in data:

284

counter.inc(item, 1)

285

286

print(f"Total items: {counter.total}") # 15

287

print(f"Unique items: {len([k for k, c in counter if c > 0])}")

288

289

# Get frequency statistics

290

frequencies = [count for key, count in counter if count > 0]

291

print(f"Max frequency: {max(frequencies)}")

292

print(f"Min frequency: {min(frequencies)}")

293

294

# Apply smoothing for better probability estimates

295

if len(set(frequencies)) >= 2: # Need variety for smoothing

296

counter.smooth()

297

print(f"Smoothed probability for unseen item: {counter.prob(999)}")

298

```

299

300

## Performance Characteristics

301

302

- **Counting**: O(1) average case for increment operations

303

- **Probability**: O(1) for probability calculations

304

- **Smoothing**: O(n) where n is number of unique frequencies

305

- **Memory**: Efficient storage using underlying hash map

306

- **Statistical accuracy**: Good-Turing smoothing provides improved probability estimates