0
# Frequency Counters
1
2
Efficient frequency counting for uint64-valued keys with optional Good-Turing smoothing for probability estimation. Designed for statistical applications and natural language processing tasks.
3
4
## Capabilities
5
6
### PreshCounter Class
7
8
Frequency counter with statistical operations and probability estimation capabilities.
9
10
```python { .api }
11
class PreshCounter:
12
"""
13
Count occurrences of uint64-valued keys with optional smoothing.
14
"""
15
def __init__(self, initial_size: int = 8):
16
"""
17
Create a new frequency counter.
18
19
Parameters:
20
- initial_size: Initial capacity (must be power of 2, defaults to 8)
21
"""
22
23
def __getitem__(self, key: int) -> int:
24
"""
25
Get count for key.
26
27
Parameters:
28
- key: 64-bit unsigned integer key
29
30
Returns:
31
- Count for key (0 if key not seen)
32
"""
33
34
def __len__(self) -> int:
35
"""
36
Get capacity of underlying map.
37
38
Returns:
39
- Capacity of counter
40
"""
41
42
def __iter__(self) -> Iterator[tuple[int, int]]:
43
"""
44
Iterate over (key, count) pairs.
45
46
Returns:
47
- Iterator yielding (key, count) tuples
48
"""
49
```
50
51
### Counting Operations
52
53
Core counting functionality for frequency tracking.
54
55
```python { .api }
56
def inc(self, key: int, inc: int) -> int:
57
"""
58
Increment counter for key.
59
60
Parameters:
61
- key: 64-bit unsigned integer key
62
- inc: Amount to increment (can be negative)
63
64
Returns:
65
- New count for key
66
67
Raises:
68
- Exception: On error
69
"""
70
```
71
72
### Statistical Operations
73
74
Probability estimation and smoothing operations.
75
76
```python { .api }
77
def prob(self, key: int) -> float:
78
"""
79
Get probability for key.
80
81
Parameters:
82
- key: 64-bit unsigned integer key
83
84
Returns:
85
- Probability (count/total) or smoothed probability if smoother is set
86
"""
87
88
def smooth(self) -> None:
89
"""
90
Apply Good-Turing smoothing to the frequency distribution.
91
Creates a GaleSmoother instance for probability estimation.
92
93
Raises:
94
- AssertionError: If data cannot be smoothed (need items with count 1 and 2)
95
"""
96
```
97
98
### Properties
99
100
```python { .api }
101
@property
102
def total(self) -> int:
103
"""
104
Get total count across all keys.
105
106
Returns:
107
- Sum of all counts
108
"""
109
110
@property
111
def length(self) -> int:
112
"""
113
Get capacity of underlying map.
114
115
Returns:
116
- Capacity of counter
117
"""
118
119
@property
120
def smoother(self) -> GaleSmoother | None:
121
"""
122
Get smoother instance (None until smooth() is called).
123
124
Returns:
125
- GaleSmoother instance or None
126
"""
127
```
128
129
### GaleSmoother Class
130
131
Good-Turing smoother for probability estimation (created by smooth() method).
132
133
```python { .api }
134
class GaleSmoother:
135
"""
136
Good-Turing smoother for frequency distributions.
137
"""
138
def __call__(self, count: int) -> float:
139
"""
140
Get smoothed count for raw count value.
141
142
Parameters:
143
- count: Raw frequency count
144
145
Returns:
146
- Smoothed count value
147
"""
148
149
@property
150
def cutoff(self) -> int:
151
"""
152
Get smoothing cutoff value.
153
154
Returns:
155
- Cutoff value for smoothing
156
"""
157
158
@property
159
def total(self) -> float:
160
"""
161
Get total smoothed count.
162
163
Returns:
164
- Total of all smoothed counts
165
"""
166
167
def count_count(self, r: int) -> int:
168
"""
169
Get count of items with specific frequency.
170
171
Parameters:
172
- r: Frequency value to query
173
174
Returns:
175
- Number of items that occur exactly r times
176
"""
177
```
178
179
## Usage Examples
180
181
### Basic Counting
182
183
```python
184
from preshed.counter import PreshCounter
185
186
# Create counter
187
counter = PreshCounter()
188
189
# Count occurrences
190
counter.inc(42, 1) # Increment key 42 by 1
191
counter.inc(42, 4) # Increment key 42 by 4 more
192
counter.inc(100, 10) # Increment key 100 by 10
193
194
print(counter[42]) # 5
195
print(counter[100]) # 10
196
print(counter[999]) # 0 (never seen)
197
198
print(counter.total) # 15
199
```
200
201
### Probability Estimation
202
203
```python
204
from preshed.counter import PreshCounter
205
206
counter = PreshCounter()
207
208
# Add some data
209
counter.inc(1, 10) # 10 occurrences
210
counter.inc(2, 20) # 20 occurrences
211
counter.inc(3, 5) # 5 occurrences
212
213
# Get raw probabilities
214
print(counter.prob(1)) # 10/35 ≈ 0.286
215
print(counter.prob(2)) # 20/35 ≈ 0.571
216
print(counter.prob(3)) # 5/35 ≈ 0.143
217
print(counter.prob(999)) # 0.0 (unseen)
218
```
219
220
### Good-Turing Smoothing
221
222
```python
223
from preshed.counter import PreshCounter
224
225
# Create frequency distribution
226
counter = PreshCounter()
227
228
# Add data (need items with count 1 and 2 for smoothing)
229
for i in range(10):
230
counter.inc(100 + i, 1) # 10 items with frequency 1
231
232
for i in range(6):
233
counter.inc(200 + i, 2) # 6 items with frequency 2
234
235
for i in range(4):
236
counter.inc(300 + i, 3) # 4 items with frequency 3
237
238
print(f"Total before smoothing: {counter.total}")
239
240
# Apply smoothing
241
counter.smooth()
242
243
# Now probabilities are smoothed
244
print(counter.prob(100)) # Smoothed probability for freq-1 item
245
print(counter.prob(999)) # Non-zero probability for unseen item
246
247
# Access smoother directly
248
smoother = counter.smoother
249
print(smoother(1)) # Smoothed count for frequency 1
250
print(smoother.total) # Total smoothed count
251
```
252
253
### Iteration
254
255
```python
256
from preshed.counter import PreshCounter
257
258
counter = PreshCounter()
259
counter.inc(1, 5)
260
counter.inc(2, 3)
261
counter.inc(3, 8)
262
263
# Iterate over (key, count) pairs
264
for key, count in counter:
265
print(f"Key {key}: {count} occurrences")
266
267
# Sort by frequency
268
sorted_items = sorted(counter, key=lambda x: x[1], reverse=True)
269
for key, count in sorted_items:
270
print(f"Key {key}: {count} occurrences")
271
```
272
273
### Statistical Analysis
274
275
```python
276
from preshed.counter import PreshCounter
277
278
# Build frequency distribution
279
counter = PreshCounter()
280
281
# Add data points
282
data = [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5]
283
for item in data:
284
counter.inc(item, 1)
285
286
print(f"Total items: {counter.total}") # 15
287
print(f"Unique items: {len([k for k, c in counter if c > 0])}")
288
289
# Get frequency statistics
290
frequencies = [count for key, count in counter if count > 0]
291
print(f"Max frequency: {max(frequencies)}")
292
print(f"Min frequency: {min(frequencies)}")
293
294
# Apply smoothing for better probability estimates
295
if len(set(frequencies)) >= 2: # Need variety for smoothing
296
counter.smooth()
297
print(f"Smoothed probability for unseen item: {counter.prob(999)}")
298
```
299
300
## Performance Characteristics
301
302
- **Counting**: O(1) average case for increment operations
303
- **Probability**: O(1) for probability calculations
304
- **Smoothing**: O(n) where n is number of unique frequencies
305
- **Memory**: Efficient storage using underlying hash map
306
- **Statistical accuracy**: Good-Turing smoothing provides improved probability estimates