or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

cardinality.mdfrequency.mdhash.mdindex.mdmembership.mdquantile.mdstream-summary.md
tile.json

tessl/maven-com-clearspring-analytics--stream

A library for summarizing data in streams for which it is infeasible to store all events

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/com.clearspring.analytics/stream@2.9.x

To install, run

npx @tessl/cli install tessl/maven-com-clearspring-analytics--stream@2.9.0

index.mddocs/

Stream-lib

Stream-lib is a Java library for summarizing data in streams where it is infeasible to store all events. It provides probabilistic data structures and algorithms for cardinality estimation (counting unique elements), frequency estimation (counting occurrences), quantile estimation (computing percentiles), set membership testing (Bloom filters), and top-K element tracking. A key feature is that cardinality estimators with compatible configurations can be safely merged, making it ideal for distributed computing scenarios.

Package Information

  • Package Name: stream
  • Package Type: Maven
  • Group ID: com.clearspring.analytics
  • Artifact ID: stream
  • Language: Java
  • Installation:
    <dependency>
      <groupId>com.clearspring.analytics</groupId>
      <artifactId>stream</artifactId>
      <version>2.9.8</version>
    </dependency>

Core Imports

import com.clearspring.analytics.stream.cardinality.*;
import com.clearspring.analytics.stream.frequency.*;
import com.clearspring.analytics.stream.quantile.*;
import com.clearspring.analytics.stream.membership.*;
import com.clearspring.analytics.stream.*;

Basic Usage

import com.clearspring.analytics.stream.cardinality.HyperLogLog;
import com.clearspring.analytics.stream.frequency.CountMinSketch;
import com.clearspring.analytics.stream.StreamSummary;

// Cardinality estimation
HyperLogLog hll = new HyperLogLog(0.1); // 10% relative standard deviation
hll.offer("item1");
hll.offer("item2");
hll.offer("item1"); // duplicate
long uniqueCount = hll.cardinality(); // approximately 2

// Frequency estimation
CountMinSketch cms = new CountMinSketch(0.01, 0.99, 1); // 1% error, 99% confidence
cms.add("apple", 5);
cms.add("banana", 3);
long appleCount = cms.estimateCount("apple"); // approximately 5

// Top-K tracking
StreamSummary<String> topK = new StreamSummary<>(10); // track top 10
topK.offer("apple");
topK.offer("banana");
topK.offer("apple");
List<String> top3 = topK.peek(3); // ["apple", "banana"]

Architecture

Stream-lib is organized into several key functional areas:

  • Hash Functions: Fast, well-distributed hash functions (MurmurHash, Lookup3Hash) used by probabilistic data structures
  • Cardinality Estimation: Algorithms for counting unique elements (HyperLogLog, LogLog, LinearCounting)
  • Frequency Estimation: Data structures for tracking item frequencies (Count-Min Sketch, Conservative Add Sketch)
  • Set Membership: Bloom filters for testing whether an element is in a set
  • Quantile Estimation: Algorithms for computing percentiles and quantiles (QDigest, TDigest)
  • Stream Summarization: Top-K tracking and stream summary algorithms
  • Utilities: Supporting data structures and helper functions

Capabilities

Cardinality Estimation

Probabilistic algorithms for estimating the number of unique elements in a stream. All estimators can be merged if they have compatible configurations.

public interface ICardinality {
    boolean offer(Object o);
    boolean offerHashed(long hashedLong);
    boolean offerHashed(int hashedInt);
    long cardinality();
    int sizeof();
    byte[] getBytes() throws IOException;
    ICardinality merge(ICardinality... estimators) throws CardinalityMergeException;
}

Cardinality Estimation

Frequency Estimation

Data structures for estimating the frequency of items in a stream with configurable error bounds and confidence levels.

public interface IFrequency {
    void add(long item, long count);
    void add(String item, long count);
    long estimateCount(long item);
    long estimateCount(String item);
    long size();
}

Frequency Estimation

Set Membership Testing

Bloom filters for probabilistic set membership testing with configurable false positive rates but no false negatives.

public class BloomFilter extends Filter {
    public BloomFilter(int numElements, double maxFalsePosProbability);
    public boolean isPresent(String key);
    public boolean isPresent(byte[] key);
    public void add(String key);
    public void add(byte[] key);
    public void addAll(BloomFilter other);
}

Membership Testing

Quantile Estimation

Algorithms for computing quantiles and percentiles in streaming data with memory-efficient approximation techniques.

public interface IQuantileEstimator {
    void offer(long value);
    long getQuantile(double q);
}

Quantile Estimation

Stream Summarization and Top-K

Algorithms for tracking the most frequent items and maintaining stream summaries with error bounds.

public interface ITopK<T> {
    boolean offer(T element);
    boolean offer(T element, int incrementCount);
    List<T> peek(int k);
}

Stream Summarization

Hash Functions

Fast, well-distributed hash functions optimized for use in probabilistic data structures and general hash-based lookup.

public class MurmurHash {
    public static int hash(Object o);
    public static int hash(byte[] data, int seed);
    public static long hash64(Object o);
    public static long hash64(byte[] data, int length, int seed);
}

Hash Functions

Types

Common Exceptions

public abstract class CardinalityMergeException extends Exception {
    public CardinalityMergeException(String message);
}

public abstract class FrequencyMergeException extends Exception {
}

Utility Types

public interface IBuilder<T> {
    T build();
    int sizeof();
}

public class Pair<A, B> {
    public final A left;
    public final B right;
    
    public Pair(A left, B right);
}

public class Counter<T> implements Externalizable {
    public T getItem();
    public long getCount();
    public long getError();
}