or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bloom-filter.mdcount-min-sketch.mdindex.mdserialization.md
tile.json

count-min-sketch.mddocs/

Count-Min Sketch Operations

Probabilistic data structure for frequency estimation and cardinality counting using sub-linear space. Count-Min sketches provide approximate frequency estimates with bounded error guarantees, making them ideal for streaming data analysis and heavy hitters detection in large-scale systems.

Capabilities

Creating Count-Min Sketches

Factory methods for creating Count-Min sketches with different configuration approaches.

/**
 * Creates a Count-Min sketch with specified error parameters and random seed
 * @param eps relative error (must be positive)
 * @param confidence confidence level (must be between 0.0 and 1.0)
 * @param seed random seed for hash functions
 * @return new CountMinSketch instance
 */
public static CountMinSketch create(double eps, double confidence, int seed);

/**
 * Creates a Count-Min sketch with explicit dimensions and random seed
 * @param depth depth of the 2D array (number of hash functions)
 * @param width width of the 2D array (number of buckets per hash function)
 * @param seed random seed for hash functions
 * @return new CountMinSketch instance
 */
public static CountMinSketch create(int depth, int width, int seed);

Usage Examples:

// Create with error parameters (recommended approach)
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42); // 1% error, 99% confidence

// Create with explicit dimensions
CountMinSketch sketch2 = CountMinSketch.create(8, 200, 42); // 8 hash functions, 200 buckets each

Adding Items and Counts

Methods for incrementing item frequencies in the sketch.

/**
 * Increments an item's count by 1, supporting String, byte[], and integral types
 * @param item item to increment (String, byte[], Byte, Short, Integer, Long)
 */
public abstract void add(Object item);

/**
 * Increments an item's count by specified amount
 * @param item item to increment
 * @param count count to add (must be non-negative)
 */
public abstract void add(Object item, long count);

/**
 * Specialized method for incrementing String items by 1
 * @param item string item to increment
 */
public abstract void addString(String item);

/**
 * Specialized method for incrementing String items by count
 * @param item string item to increment
 * @param count count to add
 */
public abstract void addString(String item, long count);

/**
 * Specialized method for incrementing long values by 1
 * @param item long value to increment
 */
public abstract void addLong(long item);

/**
 * Specialized method for incrementing long values by count
 * @param item long value to increment
 * @param count count to add
 */
public abstract void addLong(long item, long count);

/**
 * Specialized method for incrementing byte arrays by 1
 * @param item byte array to increment
 */
public abstract void addBinary(byte[] item);

/**
 * Specialized method for incrementing byte arrays by count
 * @param item byte array to increment
 * @param count count to add
 */
public abstract void addBinary(byte[] item, long count);

Usage Examples:

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

// Add items with count 1
sketch.add("user123");
sketch.add("user456");
sketch.add("user123");  // user123 now has count 2

// Add items with specific counts
sketch.add("product_a", 5);
sketch.addString("event_type_click", 10);
sketch.addLong(42L, 3);
sketch.addBinary("session_id".getBytes(), 7);

Frequency Estimation

Method for retrieving estimated frequencies of items.

/**
 * Returns the estimated frequency of an item
 * @param item item to estimate (String, byte[], Byte, Short, Integer, Long)
 * @return estimated count (guaranteed to be >= actual count)
 */
public abstract long estimateCount(Object item);

Usage Examples:

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

// Add some items
sketch.add("popular_item", 100);
sketch.add("rare_item", 1);
sketch.add("medium_item", 25);

// Estimate frequencies
long popular = sketch.estimateCount("popular_item");  // >= 100
long rare = sketch.estimateCount("rare_item");        // >= 1  
long medium = sketch.estimateCount("medium_item");    // >= 25
long unknown = sketch.estimateCount("never_seen");    // likely 0, but could be higher due to hash collisions

Sketch Properties

Methods for accessing Count-Min sketch configuration and state.

/**
 * Returns the relative error (epsilon) parameter
 * @return relative error bound
 */
public abstract double relativeError();

/**
 * Returns the confidence (delta) parameter  
 * @return confidence level
 */
public abstract double confidence();

/**
 * Returns the depth of the 2D array (number of hash functions)
 * @return depth dimension
 */
public abstract int depth();

/**
 * Returns the width of the 2D array (buckets per hash function)
 * @return width dimension
 */
public abstract int width();

/**
 * Returns the total count of all items added to the sketch
 * @return sum of all counts added
 */
public abstract long totalCount();

Usage Examples:

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

// Add some data
sketch.add("item1", 10);
sketch.add("item2", 20);

// Check properties
double eps = sketch.relativeError();     // 0.01
double conf = sketch.confidence();       // 0.99
int d = sketch.depth();                  // calculated from confidence
int w = sketch.width();                  // calculated from eps
long total = sketch.totalCount();        // 30

Sketch Merging

Operation for combining multiple Count-Min sketches.

/**
 * Merges another Count-Min sketch into this one
 * Note: Only sketches with same depth, width, and random seed can be merged
 * @param other the other Count-Min sketch to merge
 * @return this sketch after merging
 * @throws IncompatibleMergeException if sketches are incompatible
 */
public abstract CountMinSketch mergeInPlace(CountMinSketch other) throws IncompatibleMergeException;

Usage Examples:

// Create compatible sketches (same parameters)
CountMinSketch sketch1 = CountMinSketch.create(0.01, 0.99, 42);
CountMinSketch sketch2 = CountMinSketch.create(0.01, 0.99, 42);

// Add different data to each
sketch1.add("user_a", 10);
sketch1.add("user_b", 5);

sketch2.add("user_b", 3);  // user_b appears in both sketches
sketch2.add("user_c", 7);

// Merge sketch2 into sketch1
sketch1.mergeInPlace(sketch2);

// Now sketch1 contains combined frequencies
long userA = sketch1.estimateCount("user_a");  // >= 10
long userB = sketch1.estimateCount("user_b");  // >= 8 (5 + 3)
long userC = sketch1.estimateCount("user_c");  // >= 7

Error Guarantees

Count-Min sketches provide probabilistic guarantees about estimation accuracy:

  • Overestimation Only: estimateCount(item) >= actualCount(item) (never underestimates)
  • Bounded Error: With probability confidence, the error is at most relativeError * totalCount
  • Example: With eps=0.01 and confidence=0.99, estimates are within 1% of total count with 99% probability

Practical Example:

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

// Add 1 million total items
for (int i = 0; i < 1000000; i++) {
    sketch.add("item_" + (i % 10000)); // 10k unique items, 100 occurrences each
}

// For any item that appeared 100 times:
// - Actual count: 100
// - Estimated count: between 100 and 110 (100 + 0.01 * 1000000) with 99% probability
long estimate = sketch.estimateCount("item_5000"); // likely between 100-110

Types

public enum CountMinSketch.Version {
    /** Version 1 binary format for serialization */
    V1(1);
}

Error Handling

  • IllegalArgumentException: Thrown for invalid parameters:
    • Negative relative error
    • Confidence not in range (0.0, 1.0)
    • Non-positive depth or width
    • Negative count increments
  • IncompatibleMergeException: Thrown when merging incompatible sketches (different dimensions or random seeds)