Probabilistic data structure for frequency estimation and cardinality counting using sub-linear space. Count-Min sketches provide approximate frequency estimates with bounded error guarantees, making them ideal for streaming data analysis and heavy hitters detection in large-scale systems.
Factory methods for creating Count-Min sketches with different configuration approaches.
/**
* Creates a Count-Min sketch with specified error parameters and random seed
* @param eps relative error (must be positive)
* @param confidence confidence level (must be between 0.0 and 1.0)
* @param seed random seed for hash functions
* @return new CountMinSketch instance
*/
public static CountMinSketch create(double eps, double confidence, int seed);
/**
* Creates a Count-Min sketch with explicit dimensions and random seed
* @param depth depth of the 2D array (number of hash functions)
* @param width width of the 2D array (number of buckets per hash function)
* @param seed random seed for hash functions
* @return new CountMinSketch instance
*/
public static CountMinSketch create(int depth, int width, int seed);Usage Examples:
// Create with error parameters (recommended approach)
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42); // 1% error, 99% confidence
// Create with explicit dimensions
CountMinSketch sketch2 = CountMinSketch.create(8, 200, 42); // 8 hash functions, 200 buckets eachMethods for incrementing item frequencies in the sketch.
/**
* Increments an item's count by 1, supporting String, byte[], and integral types
* @param item item to increment (String, byte[], Byte, Short, Integer, Long)
*/
public abstract void add(Object item);
/**
* Increments an item's count by specified amount
* @param item item to increment
* @param count count to add (must be non-negative)
*/
public abstract void add(Object item, long count);
/**
* Specialized method for incrementing String items by 1
* @param item string item to increment
*/
public abstract void addString(String item);
/**
* Specialized method for incrementing String items by count
* @param item string item to increment
* @param count count to add
*/
public abstract void addString(String item, long count);
/**
* Specialized method for incrementing long values by 1
* @param item long value to increment
*/
public abstract void addLong(long item);
/**
* Specialized method for incrementing long values by count
* @param item long value to increment
* @param count count to add
*/
public abstract void addLong(long item, long count);
/**
* Specialized method for incrementing byte arrays by 1
* @param item byte array to increment
*/
public abstract void addBinary(byte[] item);
/**
* Specialized method for incrementing byte arrays by count
* @param item byte array to increment
* @param count count to add
*/
public abstract void addBinary(byte[] item, long count);Usage Examples:
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
// Add items with count 1
sketch.add("user123");
sketch.add("user456");
sketch.add("user123"); // user123 now has count 2
// Add items with specific counts
sketch.add("product_a", 5);
sketch.addString("event_type_click", 10);
sketch.addLong(42L, 3);
sketch.addBinary("session_id".getBytes(), 7);Method for retrieving estimated frequencies of items.
/**
* Returns the estimated frequency of an item
* @param item item to estimate (String, byte[], Byte, Short, Integer, Long)
* @return estimated count (guaranteed to be >= actual count)
*/
public abstract long estimateCount(Object item);Usage Examples:
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
// Add some items
sketch.add("popular_item", 100);
sketch.add("rare_item", 1);
sketch.add("medium_item", 25);
// Estimate frequencies
long popular = sketch.estimateCount("popular_item"); // >= 100
long rare = sketch.estimateCount("rare_item"); // >= 1
long medium = sketch.estimateCount("medium_item"); // >= 25
long unknown = sketch.estimateCount("never_seen"); // likely 0, but could be higher due to hash collisionsMethods for accessing Count-Min sketch configuration and state.
/**
* Returns the relative error (epsilon) parameter
* @return relative error bound
*/
public abstract double relativeError();
/**
* Returns the confidence (delta) parameter
* @return confidence level
*/
public abstract double confidence();
/**
* Returns the depth of the 2D array (number of hash functions)
* @return depth dimension
*/
public abstract int depth();
/**
* Returns the width of the 2D array (buckets per hash function)
* @return width dimension
*/
public abstract int width();
/**
* Returns the total count of all items added to the sketch
* @return sum of all counts added
*/
public abstract long totalCount();Usage Examples:
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
// Add some data
sketch.add("item1", 10);
sketch.add("item2", 20);
// Check properties
double eps = sketch.relativeError(); // 0.01
double conf = sketch.confidence(); // 0.99
int d = sketch.depth(); // calculated from confidence
int w = sketch.width(); // calculated from eps
long total = sketch.totalCount(); // 30Operation for combining multiple Count-Min sketches.
/**
* Merges another Count-Min sketch into this one
* Note: Only sketches with same depth, width, and random seed can be merged
* @param other the other Count-Min sketch to merge
* @return this sketch after merging
* @throws IncompatibleMergeException if sketches are incompatible
*/
public abstract CountMinSketch mergeInPlace(CountMinSketch other) throws IncompatibleMergeException;Usage Examples:
// Create compatible sketches (same parameters)
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42);
CountMinSketch sketch2 = CountMinSketch.create(0.01, 0.99, 42);
// Add different data to each
sketch1.add("user_a", 10);
sketch1.add("user_b", 5);
sketch2.add("user_b", 3); // user_b appears in both sketches
sketch2.add("user_c", 7);
// Merge sketch2 into sketch1
sketch1.mergeInPlace(sketch2);
// Now sketch1 contains combined frequencies
long userA = sketch1.estimateCount("user_a"); // >= 10
long userB = sketch1.estimateCount("user_b"); // >= 8 (5 + 3)
long userC = sketch1.estimateCount("user_c"); // >= 7Count-Min sketches provide probabilistic guarantees about estimation accuracy:
estimateCount(item) >= actualCount(item) (never underestimates)confidence, the error is at most relativeError * totalCounteps=0.01 and confidence=0.99, estimates are within 1% of total count with 99% probabilityPractical Example:
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
// Add 1 million total items
for (int i = 0; i < 1000000; i++) {
sketch.add("item_" + (i % 10000)); // 10k unique items, 100 occurrences each
}
// For any item that appeared 100 times:
// - Actual count: 100
// - Estimated count: between 100 and 110 (100 + 0.01 * 1000000) with 99% probability
long estimate = sketch.estimateCount("item_5000"); // likely between 100-110public enum CountMinSketch.Version {
/** Version 1 binary format for serialization */
V1(1);
}IllegalArgumentException: Thrown for invalid parameters:
IncompatibleMergeException: Thrown when merging incompatible sketches (different dimensions or random seeds)