or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cardinality.mdfrequency.mdhash.mdindex.mdmembership.mdquantile.mdstream-summary.md

index.mddocs/

0

# Stream-lib

1

2

Stream-lib is a Java library for summarizing data in streams where it is infeasible to store all events. It provides probabilistic data structures and algorithms for cardinality estimation (counting unique elements), frequency estimation (counting occurrences), quantile estimation (computing percentiles), set membership testing (Bloom filters), and top-K element tracking. A key feature is that cardinality estimators with compatible configurations can be safely merged, making it ideal for distributed computing scenarios.

3

4

## Package Information

5

6

- **Package Name**: stream

7

- **Package Type**: Maven

8

- **Group ID**: com.clearspring.analytics

9

- **Artifact ID**: stream

10

- **Language**: Java

11

- **Installation**:

12

```xml

13

<dependency>

14

<groupId>com.clearspring.analytics</groupId>

15

<artifactId>stream</artifactId>

16

<version>2.9.8</version>

17

</dependency>

18

```

19

20

## Core Imports

21

22

```java

23

import com.clearspring.analytics.stream.cardinality.*;

24

import com.clearspring.analytics.stream.frequency.*;

25

import com.clearspring.analytics.stream.quantile.*;

26

import com.clearspring.analytics.stream.membership.*;

27

import com.clearspring.analytics.stream.*;

28

```

29

30

## Basic Usage

31

32

```java

33

import com.clearspring.analytics.stream.cardinality.HyperLogLog;

34

import com.clearspring.analytics.stream.frequency.CountMinSketch;

35

import com.clearspring.analytics.stream.StreamSummary;

36

37

// Cardinality estimation

38

HyperLogLog hll = new HyperLogLog(0.1); // 10% relative standard deviation

39

hll.offer("item1");

40

hll.offer("item2");

41

hll.offer("item1"); // duplicate

42

long uniqueCount = hll.cardinality(); // approximately 2

43

44

// Frequency estimation

45

CountMinSketch cms = new CountMinSketch(0.01, 0.99, 1); // 1% error, 99% confidence

46

cms.add("apple", 5);

47

cms.add("banana", 3);

48

long appleCount = cms.estimateCount("apple"); // approximately 5

49

50

// Top-K tracking

51

StreamSummary<String> topK = new StreamSummary<>(10); // track top 10

52

topK.offer("apple");

53

topK.offer("banana");

54

topK.offer("apple");

55

List<String> top3 = topK.peek(3); // ["apple", "banana"]

56

```

57

58

## Architecture

59

60

Stream-lib is organized into several key functional areas:

61

62

- **Hash Functions**: Fast, well-distributed hash functions (MurmurHash, Lookup3Hash) used by probabilistic data structures

63

- **Cardinality Estimation**: Algorithms for counting unique elements (HyperLogLog, LogLog, LinearCounting)

64

- **Frequency Estimation**: Data structures for tracking item frequencies (Count-Min Sketch, Conservative Add Sketch)

65

- **Set Membership**: Bloom filters for testing whether an element is in a set

66

- **Quantile Estimation**: Algorithms for computing percentiles and quantiles (QDigest, TDigest)

67

- **Stream Summarization**: Top-K tracking and stream summary algorithms

68

- **Utilities**: Supporting data structures and helper functions

69

70

## Capabilities

71

72

### Cardinality Estimation

73

74

Probabilistic algorithms for estimating the number of unique elements in a stream. All estimators can be merged if they have compatible configurations.

75

76

```java { .api }

77

public interface ICardinality {

78

boolean offer(Object o);

79

boolean offerHashed(long hashedLong);

80

boolean offerHashed(int hashedInt);

81

long cardinality();

82

int sizeof();

83

byte[] getBytes() throws IOException;

84

ICardinality merge(ICardinality... estimators) throws CardinalityMergeException;

85

}

86

```

87

88

[Cardinality Estimation](./cardinality.md)

89

90

### Frequency Estimation

91

92

Data structures for estimating the frequency of items in a stream with configurable error bounds and confidence levels.

93

94

```java { .api }

95

public interface IFrequency {

96

void add(long item, long count);

97

void add(String item, long count);

98

long estimateCount(long item);

99

long estimateCount(String item);

100

long size();

101

}

102

```

103

104

[Frequency Estimation](./frequency.md)

105

106

### Set Membership Testing

107

108

Bloom filters for probabilistic set membership testing with configurable false positive rates but no false negatives.

109

110

```java { .api }

111

public class BloomFilter extends Filter {

112

public BloomFilter(int numElements, double maxFalsePosProbability);

113

public boolean isPresent(String key);

114

public boolean isPresent(byte[] key);

115

public void add(String key);

116

public void add(byte[] key);

117

public void addAll(BloomFilter other);

118

}

119

```

120

121

[Membership Testing](./membership.md)

122

123

### Quantile Estimation

124

125

Algorithms for computing quantiles and percentiles in streaming data with memory-efficient approximation techniques.

126

127

```java { .api }

128

public interface IQuantileEstimator {

129

void offer(long value);

130

long getQuantile(double q);

131

}

132

```

133

134

[Quantile Estimation](./quantile.md)

135

136

### Stream Summarization and Top-K

137

138

Algorithms for tracking the most frequent items and maintaining stream summaries with error bounds.

139

140

```java { .api }

141

public interface ITopK<T> {

142

boolean offer(T element);

143

boolean offer(T element, int incrementCount);

144

List<T> peek(int k);

145

}

146

```

147

148

[Stream Summarization](./stream-summary.md)

149

150

### Hash Functions

151

152

Fast, well-distributed hash functions optimized for use in probabilistic data structures and general hash-based lookup.

153

154

```java { .api }

155

public class MurmurHash {

156

public static int hash(Object o);

157

public static int hash(byte[] data, int seed);

158

public static long hash64(Object o);

159

public static long hash64(byte[] data, int length, int seed);

160

}

161

```

162

163

[Hash Functions](./hash.md)

164

165

## Types

166

167

### Common Exceptions

168

169

```java { .api }

170

public abstract class CardinalityMergeException extends Exception {

171

public CardinalityMergeException(String message);

172

}

173

174

public abstract class FrequencyMergeException extends Exception {

175

}

176

```

177

178

### Utility Types

179

180

```java { .api }

181

public interface IBuilder<T> {

182

T build();

183

int sizeof();

184

}

185

186

public class Pair<A, B> {

187

public final A left;

188

public final B right;

189

190

public Pair(A left, B right);

191

}

192

193

public class Counter<T> implements Externalizable {

194

public T getItem();

195

public long getCount();

196

public long getError();

197

}

198

```