or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bloom-filter.mdcount-min-sketch.mdindex.mdserialization.md

count-min-sketch.mddocs/

0

# Count-Min Sketch Operations

1

2

Probabilistic data structure for frequency estimation and cardinality counting using sub-linear space. Count-Min sketches provide approximate frequency estimates with bounded error guarantees, making them ideal for streaming data analysis and heavy hitters detection in large-scale systems.

3

4

## Capabilities

5

6

### Creating Count-Min Sketches

7

8

Factory methods for creating Count-Min sketches with different configuration approaches.

9

10

```java { .api }

11

/**

12

* Creates a Count-Min sketch with specified error parameters and random seed

13

* @param eps relative error (must be positive)

14

* @param confidence confidence level (must be between 0.0 and 1.0)

15

* @param seed random seed for hash functions

16

* @return new CountMinSketch instance

17

*/

18

public static CountMinSketch create(double eps, double confidence, int seed);

19

20

/**

21

* Creates a Count-Min sketch with explicit dimensions and random seed

22

* @param depth depth of the 2D array (number of hash functions)

23

* @param width width of the 2D array (number of buckets per hash function)

24

* @param seed random seed for hash functions

25

* @return new CountMinSketch instance

26

*/

27

public static CountMinSketch create(int depth, int width, int seed);

28

```

29

30

**Usage Examples:**

31

32

```java

33

// Create with error parameters (recommended approach)

34

CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42); // 1% error, 99% confidence

35

36

// Create with explicit dimensions

37

CountMinSketch sketch2 = CountMinSketch.create(8, 200, 42); // 8 hash functions, 200 buckets each

38

```

39

40

### Adding Items and Counts

41

42

Methods for incrementing item frequencies in the sketch.

43

44

```java { .api }

45

/**

46

* Increments an item's count by 1, supporting String, byte[], and integral types

47

* @param item item to increment (String, byte[], Byte, Short, Integer, Long)

48

*/

49

public abstract void add(Object item);

50

51

/**

52

* Increments an item's count by specified amount

53

* @param item item to increment

54

* @param count count to add (must be non-negative)

55

*/

56

public abstract void add(Object item, long count);

57

58

/**

59

* Specialized method for incrementing String items by 1

60

* @param item string item to increment

61

*/

62

public abstract void addString(String item);

63

64

/**

65

* Specialized method for incrementing String items by count

66

* @param item string item to increment

67

* @param count count to add

68

*/

69

public abstract void addString(String item, long count);

70

71

/**

72

* Specialized method for incrementing long values by 1

73

* @param item long value to increment

74

*/

75

public abstract void addLong(long item);

76

77

/**

78

* Specialized method for incrementing long values by count

79

* @param item long value to increment

80

* @param count count to add

81

*/

82

public abstract void addLong(long item, long count);

83

84

/**

85

* Specialized method for incrementing byte arrays by 1

86

* @param item byte array to increment

87

*/

88

public abstract void addBinary(byte[] item);

89

90

/**

91

* Specialized method for incrementing byte arrays by count

92

* @param item byte array to increment

93

* @param count count to add

94

*/

95

public abstract void addBinary(byte[] item, long count);

96

```

97

98

**Usage Examples:**

99

100

```java

101

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

102

103

// Add items with count 1

104

sketch.add("user123");

105

sketch.add("user456");

106

sketch.add("user123"); // user123 now has count 2

107

108

// Add items with specific counts

109

sketch.add("product_a", 5);

110

sketch.addString("event_type_click", 10);

111

sketch.addLong(42L, 3);

112

sketch.addBinary("session_id".getBytes(), 7);

113

```

114

115

### Frequency Estimation

116

117

Method for retrieving estimated frequencies of items.

118

119

```java { .api }

120

/**

121

* Returns the estimated frequency of an item

122

* @param item item to estimate (String, byte[], Byte, Short, Integer, Long)

123

* @return estimated count (guaranteed to be >= actual count)

124

*/

125

public abstract long estimateCount(Object item);

126

```

127

128

**Usage Examples:**

129

130

```java

131

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

132

133

// Add some items

134

sketch.add("popular_item", 100);

135

sketch.add("rare_item", 1);

136

sketch.add("medium_item", 25);

137

138

// Estimate frequencies

139

long popular = sketch.estimateCount("popular_item"); // >= 100

140

long rare = sketch.estimateCount("rare_item"); // >= 1

141

long medium = sketch.estimateCount("medium_item"); // >= 25

142

long unknown = sketch.estimateCount("never_seen"); // likely 0, but could be higher due to hash collisions

143

```

144

145

### Sketch Properties

146

147

Methods for accessing Count-Min sketch configuration and state.

148

149

```java { .api }

150

/**

151

* Returns the relative error (epsilon) parameter

152

* @return relative error bound

153

*/

154

public abstract double relativeError();

155

156

/**

157

* Returns the confidence (delta) parameter

158

* @return confidence level

159

*/

160

public abstract double confidence();

161

162

/**

163

* Returns the depth of the 2D array (number of hash functions)

164

* @return depth dimension

165

*/

166

public abstract int depth();

167

168

/**

169

* Returns the width of the 2D array (buckets per hash function)

170

* @return width dimension

171

*/

172

public abstract int width();

173

174

/**

175

* Returns the total count of all items added to the sketch

176

* @return sum of all counts added

177

*/

178

public abstract long totalCount();

179

```

180

181

**Usage Examples:**

182

183

```java

184

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

185

186

// Add some data

187

sketch.add("item1", 10);

188

sketch.add("item2", 20);

189

190

// Check properties

191

double eps = sketch.relativeError(); // 0.01

192

double conf = sketch.confidence(); // 0.99

193

int d = sketch.depth(); // calculated from confidence

194

int w = sketch.width(); // calculated from eps

195

long total = sketch.totalCount(); // 30

196

```

197

198

### Sketch Merging

199

200

Operation for combining multiple Count-Min sketches.

201

202

```java { .api }

203

/**

204

* Merges another Count-Min sketch into this one

205

* Note: Only sketches with same depth, width, and random seed can be merged

206

* @param other the other Count-Min sketch to merge

207

* @return this sketch after merging

208

* @throws IncompatibleMergeException if sketches are incompatible

209

*/

210

public abstract CountMinSketch mergeInPlace(CountMinSketch other) throws IncompatibleMergeException;

211

```

212

213

**Usage Examples:**

214

215

```java

216

// Create compatible sketches (same parameters)

217

CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42);

218

CountMinSketch sketch2 = CountMinSketch.create(0.01, 0.99, 42);

219

220

// Add different data to each

221

sketch1.add("user_a", 10);

222

sketch1.add("user_b", 5);

223

224

sketch2.add("user_b", 3); // user_b appears in both sketches

225

sketch2.add("user_c", 7);

226

227

// Merge sketch2 into sketch1

228

sketch1.mergeInPlace(sketch2);

229

230

// Now sketch1 contains combined frequencies

231

long userA = sketch1.estimateCount("user_a"); // >= 10

232

long userB = sketch1.estimateCount("user_b"); // >= 8 (5 + 3)

233

long userC = sketch1.estimateCount("user_c"); // >= 7

234

```

235

236

## Error Guarantees

237

238

Count-Min sketches provide probabilistic guarantees about estimation accuracy:

239

240

- **Overestimation Only**: `estimateCount(item) >= actualCount(item)` (never underestimates)

241

- **Bounded Error**: With probability `confidence`, the error is at most `relativeError * totalCount`

242

- **Example**: With `eps=0.01` and `confidence=0.99`, estimates are within 1% of total count with 99% probability

243

244

**Practical Example:**

245

246

```java

247

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

248

249

// Add 1 million total items

250

for (int i = 0; i < 1000000; i++) {

251

sketch.add("item_" + (i % 10000)); // 10k unique items, 100 occurrences each

252

}

253

254

// For any item that appeared 100 times:

255

// - Actual count: 100

256

// - Estimated count: between 100 and 110 (100 + 0.01 * 1000000) with 99% probability

257

long estimate = sketch.estimateCount("item_5000"); // likely between 100-110

258

```

259

260

## Types

261

262

```java { .api }

263

public enum CountMinSketch.Version {

264

/** Version 1 binary format for serialization */

265

V1(1);

266

}

267

```

268

269

## Error Handling

270

271

- `IllegalArgumentException`: Thrown for invalid parameters:

272

- Negative relative error

273

- Confidence not in range (0.0, 1.0)

274

- Non-positive depth or width

275

- Negative count increments

276

- `IncompatibleMergeException`: Thrown when merging incompatible sketches (different dimensions or random seeds)