0
# Count-Min Sketch Operations
1
2
Probabilistic data structure for frequency estimation and cardinality counting using sub-linear space. Count-Min sketches provide approximate frequency estimates with bounded error guarantees, making them ideal for streaming data analysis and heavy hitters detection in large-scale systems.
3
4
## Capabilities
5
6
### Creating Count-Min Sketches
7
8
Factory methods for creating Count-Min sketches with different configuration approaches.
9
10
```java { .api }
11
/**
12
* Creates a Count-Min sketch with specified error parameters and random seed
13
* @param eps relative error (must be positive)
14
* @param confidence confidence level (must be between 0.0 and 1.0)
15
* @param seed random seed for hash functions
16
* @return new CountMinSketch instance
17
*/
18
public static CountMinSketch create(double eps, double confidence, int seed);
19
20
/**
21
* Creates a Count-Min sketch with explicit dimensions and random seed
22
* @param depth depth of the 2D array (number of hash functions)
23
* @param width width of the 2D array (number of buckets per hash function)
24
* @param seed random seed for hash functions
25
* @return new CountMinSketch instance
26
*/
27
public static CountMinSketch create(int depth, int width, int seed);
28
```
29
30
**Usage Examples:**
31
32
```java
33
// Create with error parameters (recommended approach)
34
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42); // 1% error, 99% confidence
35
36
// Create with explicit dimensions
37
CountMinSketch sketch2 = CountMinSketch.create(8, 200, 42); // 8 hash functions, 200 buckets each
38
```
39
40
### Adding Items and Counts
41
42
Methods for incrementing item frequencies in the sketch.
43
44
```java { .api }
45
/**
46
* Increments an item's count by 1, supporting String, byte[], and integral types
47
* @param item item to increment (String, byte[], Byte, Short, Integer, Long)
48
*/
49
public abstract void add(Object item);
50
51
/**
52
* Increments an item's count by specified amount
53
* @param item item to increment
54
* @param count count to add (must be non-negative)
55
*/
56
public abstract void add(Object item, long count);
57
58
/**
59
* Specialized method for incrementing String items by 1
60
* @param item string item to increment
61
*/
62
public abstract void addString(String item);
63
64
/**
65
* Specialized method for incrementing String items by count
66
* @param item string item to increment
67
* @param count count to add
68
*/
69
public abstract void addString(String item, long count);
70
71
/**
72
* Specialized method for incrementing long values by 1
73
* @param item long value to increment
74
*/
75
public abstract void addLong(long item);
76
77
/**
78
* Specialized method for incrementing long values by count
79
* @param item long value to increment
80
* @param count count to add
81
*/
82
public abstract void addLong(long item, long count);
83
84
/**
85
* Specialized method for incrementing byte arrays by 1
86
* @param item byte array to increment
87
*/
88
public abstract void addBinary(byte[] item);
89
90
/**
91
* Specialized method for incrementing byte arrays by count
92
* @param item byte array to increment
93
* @param count count to add
94
*/
95
public abstract void addBinary(byte[] item, long count);
96
```
97
98
**Usage Examples:**
99
100
```java
101
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
102
103
// Add items with count 1
104
sketch.add("user123");
105
sketch.add("user456");
106
sketch.add("user123"); // user123 now has count 2
107
108
// Add items with specific counts
109
sketch.add("product_a", 5);
110
sketch.addString("event_type_click", 10);
111
sketch.addLong(42L, 3);
112
sketch.addBinary("session_id".getBytes(), 7);
113
```
114
115
### Frequency Estimation
116
117
Method for retrieving estimated frequencies of items.
118
119
```java { .api }
120
/**
121
* Returns the estimated frequency of an item
122
* @param item item to estimate (String, byte[], Byte, Short, Integer, Long)
123
* @return estimated count (guaranteed to be >= actual count)
124
*/
125
public abstract long estimateCount(Object item);
126
```
127
128
**Usage Examples:**
129
130
```java
131
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
132
133
// Add some items
134
sketch.add("popular_item", 100);
135
sketch.add("rare_item", 1);
136
sketch.add("medium_item", 25);
137
138
// Estimate frequencies
139
long popular = sketch.estimateCount("popular_item"); // >= 100
140
long rare = sketch.estimateCount("rare_item"); // >= 1
141
long medium = sketch.estimateCount("medium_item"); // >= 25
142
long unknown = sketch.estimateCount("never_seen"); // likely 0, but could be higher due to hash collisions
143
```
144
145
### Sketch Properties
146
147
Methods for accessing Count-Min sketch configuration and state.
148
149
```java { .api }
150
/**
151
* Returns the relative error (epsilon) parameter
152
* @return relative error bound
153
*/
154
public abstract double relativeError();
155
156
/**
157
* Returns the confidence (delta) parameter
158
* @return confidence level
159
*/
160
public abstract double confidence();
161
162
/**
163
* Returns the depth of the 2D array (number of hash functions)
164
* @return depth dimension
165
*/
166
public abstract int depth();
167
168
/**
169
* Returns the width of the 2D array (buckets per hash function)
170
* @return width dimension
171
*/
172
public abstract int width();
173
174
/**
175
* Returns the total count of all items added to the sketch
176
* @return sum of all counts added
177
*/
178
public abstract long totalCount();
179
```
180
181
**Usage Examples:**
182
183
```java
184
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
185
186
// Add some data
187
sketch.add("item1", 10);
188
sketch.add("item2", 20);
189
190
// Check properties
191
double eps = sketch.relativeError(); // 0.01
192
double conf = sketch.confidence(); // 0.99
193
int d = sketch.depth(); // calculated from confidence
194
int w = sketch.width(); // calculated from eps
195
long total = sketch.totalCount(); // 30
196
```
197
198
### Sketch Merging
199
200
Operation for combining multiple Count-Min sketches.
201
202
```java { .api }
203
/**
204
* Merges another Count-Min sketch into this one
205
* Note: Only sketches with same depth, width, and random seed can be merged
206
* @param other the other Count-Min sketch to merge
207
* @return this sketch after merging
208
* @throws IncompatibleMergeException if sketches are incompatible
209
*/
210
public abstract CountMinSketch mergeInPlace(CountMinSketch other) throws IncompatibleMergeException;
211
```
212
213
**Usage Examples:**
214
215
```java
216
// Create compatible sketches (same parameters)
217
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42);
218
CountMinSketch sketch2 = CountMinSketch.create(0.01, 0.99, 42);
219
220
// Add different data to each
221
sketch1.add("user_a", 10);
222
sketch1.add("user_b", 5);
223
224
sketch2.add("user_b", 3); // user_b appears in both sketches
225
sketch2.add("user_c", 7);
226
227
// Merge sketch2 into sketch1
228
sketch1.mergeInPlace(sketch2);
229
230
// Now sketch1 contains combined frequencies
231
long userA = sketch1.estimateCount("user_a"); // >= 10
232
long userB = sketch1.estimateCount("user_b"); // >= 8 (5 + 3)
233
long userC = sketch1.estimateCount("user_c"); // >= 7
234
```
235
236
## Error Guarantees
237
238
Count-Min sketches provide probabilistic guarantees about estimation accuracy:
239
240
- **Overestimation Only**: `estimateCount(item) >= actualCount(item)` (never underestimates)
241
- **Bounded Error**: With probability `confidence`, the error is at most `relativeError * totalCount`
242
- **Example**: With `eps=0.01` and `confidence=0.99`, estimates are within 1% of total count with 99% probability
243
244
**Practical Example:**
245
246
```java
247
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
248
249
// Add 1 million total items
250
for (int i = 0; i < 1000000; i++) {
251
sketch.add("item_" + (i % 10000)); // 10k unique items, 100 occurrences each
252
}
253
254
// For any item that appeared 100 times:
255
// - Actual count: 100
256
// - Estimated count: between 100 and 110 (100 + 0.01 * 1000000) with 99% probability
257
long estimate = sketch.estimateCount("item_5000"); // likely between 100-110
258
```
259
260
## Types
261
262
```java { .api }
263
public enum CountMinSketch.Version {
264
/** Version 1 binary format for serialization */
265
V1(1);
266
}
267
```
268
269
## Error Handling
270
271
- `IllegalArgumentException`: Thrown for invalid parameters:
272
- Negative relative error
273
- Confidence not in range (0.0, 1.0)
274
- Non-positive depth or width
275
- Negative count increments
276
- `IncompatibleMergeException`: Thrown when merging incompatible sketches (different dimensions or random seeds)