Tessl Tile for maven/com.clearspring.analytics/stream@2.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cardinality.md frequency.md hash.md index.md membership.md quantile.md stream-summary.md

quantile.mddocs/

0
# Quantile Estimation
1

2
Algorithms for computing quantiles and percentiles in streaming data with memory-efficient approximation techniques. These data structures provide approximate quantile queries with configurable accuracy guarantees.
3

4
## Capabilities
5

6
### IQuantileEstimator Interface
7

8
Common interface implemented by all quantile estimation algorithms.
9

10
```java { .api }
11
/**
12
 * Interface for quantile estimation algorithms
13
 */
14
public interface IQuantileEstimator {
15
    /**
16
     * Add value to the estimator
17
     * @param value long value to add to the data stream
18
     */
19
    void offer(long value);
20
    
21
    /**
22
     * Get quantile estimate
23
     * @param q quantile to compute (0.0 to 1.0, e.g., 0.5 for median)
24
     * @return estimated value at the specified quantile
25
     */
26
    long getQuantile(double q);
27
}
28
```
29

30
### QDigest
31

32
Q-Digest data structure for quantile estimation with configurable compression factor that determines accuracy vs memory trade-off.
33

34
```java { .api }
35
/**
36
 * Q-Digest quantile estimator
37
 * 
38
 * Answers approximate quantile queries where actual rank of query result
39
 * is in q-eps .. q+eps, where eps = log(sigma)/compressionFactor
40
 * and log(sigma) is ceiling of binary log of largest value inserted.
41
 */
42
public class QDigest implements IQuantileEstimator {
43
    /**
44
     * Create Q-Digest with specified compression factor
45
     * @param compressionFactor higher values give better accuracy but use more memory
46
     */
47
    public QDigest(double compressionFactor);
48
    
49
    /**
50
     * Get current size (number of nodes in the digest)
51
     * @return size of the digest structure
52
     */
53
    public long size();
54
    
55
    /**
56
     * Get compression factor
57
     * @return compression factor used
58
     */
59
    public double getCompressionFactor();
60
    
61
    /**
62
     * Manually trigger compression of the digest
63
     */
64
    public void compress();
65
    
66
    /**
67
     * Create union of multiple Q-Digests
68
     * @param digests Q-Digests to union (must have same compression factor)
69
     * @return new Q-Digest representing union of input digests
70
     */
71
    public static QDigest unionOf(QDigest... digests);
72
    
73
    /**
74
     * Serialize Q-Digest to byte array
75
     * @param digest digest to serialize
76
     * @return serialized bytes
77
     * @throws IOException if serialization fails
78
     */
79
    public static byte[] serialize(QDigest digest) throws IOException;
80
    
81
    /**
82
     * Deserialize Q-Digest from byte array
83
     * @param data serialized digest data
84
     * @return deserialized Q-Digest
85
     * @throws IOException if deserialization fails
86
     */
87
    public static QDigest deserialize(byte[] data) throws IOException;
88
}
89
```
90

91
**Usage Examples:**
92

93
```java
94
import com.clearspring.analytics.stream.quantile.QDigest;
95

96
// Create Q-Digest with compression factor 100
97
QDigest qd = new QDigest(100.0);
98

99
// Add values from data stream
100
qd.offer(10);
101
qd.offer(20);
102
qd.offer(15);
103
qd.offer(30);
104
qd.offer(25);
105

106
// Query quantiles
107
long median = qd.getQuantile(0.5);      // 50th percentile (median)
108
long p90 = qd.getQuantile(0.9);         // 90th percentile
109
long p99 = qd.getQuantile(0.99);        // 99th percentile
110
long min = qd.getQuantile(0.0);         // minimum value
111
long max = qd.getQuantile(1.0);         // maximum value
112

113
System.out.println("Median: " + median);
114
System.out.println("90th percentile: " + p90);
115
```
116

117
### TDigest
118

119
T-Digest quantile estimator with better accuracy at extreme quantiles (near 0.0 and 1.0) compared to uniform accuracy across all quantiles.
120

121
```java { .api }
122
/**
123
 * T-Digest quantile estimator with improved accuracy at extremes
124
 * 
125
 * Particularly good for computing accurate tail quantiles like 99th, 99.9th percentile
126
 */
127
public class TDigest implements IQuantileEstimator {
128
    /**
129
     * Create T-Digest with specified compression parameter
130
     * @param compression compression parameter (higher = more accurate but more memory)
131
     */
132
    public TDigest(double compression);
133
    
134
    /**
135
     * Get compression parameter
136
     * @return compression parameter used
137
     */
138
    public double getCompression();
139
    
140
    /**
141
     * Manually trigger compression of the digest
142
     */
143
    public void compress();
144
    
145
    /**
146
     * Merge another T-Digest into this one
147
     * @param other T-Digest to merge
148
     */
149
    public void add(TDigest other);
150
    
151
    /**
152
     * Serialize T-Digest to byte array
153
     * @param digest digest to serialize
154
     * @return serialized bytes
155
     * @throws IOException if serialization fails
156
     */
157
    public static byte[] serialize(TDigest digest) throws IOException;
158
    
159
    /**
160
     * Deserialize T-Digest from byte array
161
     * @param data serialized digest data
162
     * @return deserialized T-Digest
163
     * @throws IOException if deserialization fails
164
     */
165
    public static TDigest deserialize(byte[] data) throws IOException;
166
}
167
```
168

169
**Usage Examples:**
170

171
```java
172
import com.clearspring.analytics.stream.quantile.TDigest;
173

174
// Create T-Digest with compression 100
175
TDigest td = new TDigest(100.0);
176

177
// Process response times (in milliseconds)
178
for (long responseTime : responseTimesStream) {
179
    td.offer(responseTime);
180
}
181

182
// Query tail latencies (T-Digest excels at these)
183
long p95 = td.getQuantile(0.95);      // 95th percentile
184
long p99 = td.getQuantile(0.99);      // 99th percentile  
185
long p999 = td.getQuantile(0.999);    // 99.9th percentile
186
long p9999 = td.getQuantile(0.9999);  // 99.99th percentile
187

188
System.out.println("P95 latency: " + p95 + "ms");
189
System.out.println("P99 latency: " + p99 + "ms");
190
System.out.println("P99.9 latency: " + p999 + "ms");
191
```
192

193
### GroupTree
194

195
Supporting data structure used internally by T-Digest for organizing centroids.
196

197
```java { .api }
198
/**
199
 * Internal data structure for T-Digest implementation
200
 * Organizes centroids in a tree structure for efficient quantile queries
201
 */
202
public class GroupTree {
203
    /**
204
     * Create empty group tree
205
     */
206
    public GroupTree();
207
    
208
    /**
209
     * Add weighted value to the tree
210
     * @param x value to add
211
     * @param w weight of the value
212
     */
213
    public void add(double x, long w);
214
    
215
    /**
216
     * Get size of the tree (total weight)
217
     * @return total weight of all values
218
     */
219
    public long size();
220
    
221
    /**
222
     * Compute quantile from the tree
223
     * @param q quantile to compute (0.0 to 1.0)
224
     * @return estimated quantile value
225
     */
226
    public double quantile(double q);
227
    
228
    /**
229
     * Compress the tree by merging nearby centroids
230
     */
231
    public void compress();
232
}
233
```
234

235
## Usage Patterns
236

237
### Basic Quantile Computation
238

239
```java
240
// For general-purpose quantile estimation
241
QDigest qd = new QDigest(100.0);
242

243
// Process data stream
244
for (long value : dataStream) {
245
    qd.offer(value);
246
}
247

248
// Get common quantiles
249
long q25 = qd.getQuantile(0.25);  // 25th percentile
250
long median = qd.getQuantile(0.5); // 50th percentile (median)
251
long q75 = qd.getQuantile(0.75);  // 75th percentile
252

253
System.out.println("IQR: [" + q25 + ", " + q75 + "]");
254
System.out.println("Median: " + median);
255
```
256

257
### Tail Latency Monitoring
258

259
```java
260
// T-Digest is better for extreme quantiles
261
TDigest latencyDigest = new TDigest(200.0);
262

263
// Process request latencies
264
for (long latency : requestLatencies) {
265
    latencyDigest.offer(latency);
266
}
267

268
// Monitor SLA compliance (T-Digest excels at tail quantiles)
269
long p99 = latencyDigest.getQuantile(0.99);
270
long p999 = latencyDigest.getQuantile(0.999);
271

272
if (p99 > SLA_P99_THRESHOLD) {
273
    System.out.println("P99 SLA violation: " + p99 + "ms");
274
}
275

276
if (p999 > SLA_P999_THRESHOLD) {
277
    System.out.println("P99.9 SLA violation: " + p999 + "ms");
278
}
279
```
280

281
### Distributed Quantile Estimation
282

283
```java
284
// Create compatible digests on different nodes
285
QDigest digest1 = new QDigest(100.0);
286
QDigest digest2 = new QDigest(100.0);
287

288
// Process data on each node
289
for (long value : node1Data) {
290
    digest1.offer(value);
291
}
292

293
for (long value : node2Data) {
294
    digest2.offer(value);
295
}
296

297
// Combine digests to get global quantiles
298
QDigest globalDigest = QDigest.unionOf(digest1, digest2);
299

300
// Query global quantiles
301
long globalMedian = globalDigest.getQuantile(0.5);
302
long globalP95 = globalDigest.getQuantile(0.95);
303
```
304

305
### Persistent Quantile Tracking
306

307
```java
308
QDigest digest = new QDigest(100.0);
309

310
// Process current batch
311
for (long value : currentBatch) {
312
    digest.offer(value);
313
}
314

315
// Serialize for storage
316
byte[] serialized = QDigest.serialize(digest);
317
saveToStorage(serialized);
318

319
// Later, restore and continue processing
320
byte[] restored = loadFromStorage();
321
QDigest restoredDigest = QDigest.deserialize(restored);
322

323
// Continue adding data
324
for (long value : nextBatch) {
325
    restoredDigest.offer(value);
326
}
327
```
328

329
### Performance Monitoring Dashboard
330

331
```java
332
// Different digests for different metrics
333
TDigest responseTimeDigest = new TDigest(100.0);
334
QDigest requestSizeDigest = new QDigest(50.0);
335
QDigest errorRateDigest = new QDigest(50.0);
336

337
// Process metrics
338
for (MetricEvent event : metricsStream) {
339
    responseTimeDigest.offer(event.responseTime);
340
    requestSizeDigest.offer(event.requestSize);
341
    errorRateDigest.offer(event.errorCount);
342
}
343

344
// Generate dashboard metrics
345
Map<String, Long> dashboardMetrics = new HashMap<>();
346

347
// Response time percentiles (use T-Digest for tail accuracy)
348
dashboardMetrics.put("response_p50", responseTimeDigest.getQuantile(0.5));
349
dashboardMetrics.put("response_p95", responseTimeDigest.getQuantile(0.95));
350
dashboardMetrics.put("response_p99", responseTimeDigest.getQuantile(0.99));
351

352
// Request size percentiles
353
dashboardMetrics.put("request_size_p50", requestSizeDigest.getQuantile(0.5));
354
dashboardMetrics.put("request_size_p90", requestSizeDigest.getQuantile(0.9));
355

356
// Error rate percentiles
357
dashboardMetrics.put("error_rate_p95", errorRateDigest.getQuantile(0.95));
358
```
359

360
## Algorithm Selection Guidelines
361

362
### QDigest vs TDigest
363

364
**Use QDigest when**:
365
- Uniform accuracy across all quantiles is needed
366
- Working with integer values
367
- Memory usage must be tightly controlled
368
- Need precise control over accuracy guarantees
369

370
**Use TDigest when**:
371
- Extreme quantiles (P95, P99, P99.9) are most important
372
- Working with continuous/floating-point distributions
373
- Tail latency monitoring is the primary use case
374
- Better accuracy at extremes is worth slightly higher memory usage
375

376
### Parameter Selection
377

378
**QDigest Compression Factor**:
379
- Higher compression = better accuracy but more memory
380
- 50-200 is typical range
381
- Error bound: ε = log₂(max_value) / compression_factor
382

383
**TDigest Compression**:
384
- Higher compression = more centroids, better accuracy
385
- 100-400 is typical range
386
- Memory usage roughly proportional to compression parameter
387

388
## Memory Usage and Performance
389

390
**QDigest**:
391
- Memory: O(compression_factor × log(max_value))
392
- Typical: 1-10 KB for most applications
393
- Insert: O(log n) amortized
394
- Query: O(log n)
395

396
**TDigest**:
397
- Memory: O(compression)  
398
- Typical: 2-20 KB for most applications
399
- Insert: O(1) amortized
400
- Query: O(compression)
401

402
Both structures support efficient merging for distributed scenarios and provide serialization for persistence.

Version

Tile

Files

quantile.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

quantile.mddocs/