0
# Reservoirs and Sampling
1
2
Reservoirs are the sampling strategies used by histograms and timers to manage memory usage while preserving statistical accuracy. They determine which values are kept for statistical analysis and which are discarded, implementing various algorithms to maintain representative samples of potentially unlimited data streams.
3
4
## Reservoir Interface
5
6
All reservoir implementations conform to the `Reservoir` interface, providing a consistent API for value storage and statistical snapshot generation.
7
8
```java { .api }
9
public interface Reservoir {
10
// Current number of values stored
11
int size();
12
13
// Add a new value to the reservoir
14
void update(long value);
15
16
// Get statistical snapshot of current values
17
Snapshot getSnapshot();
18
}
19
```
20
21
## UniformReservoir
22
23
`UniformReservoir` implements uniform random sampling using Vitter's Algorithm R, ensuring that all values have an equal probability of being retained regardless of when they were recorded. This provides an unbiased sample of the entire data stream.
24
25
```java { .api }
26
public class UniformReservoir implements Reservoir {
27
// Constructors
28
public UniformReservoir(); // Default size: 1028 samples
29
public UniformReservoir(int size); // Custom sample size
30
31
// Reservoir interface implementation
32
public int size();
33
public void update(long value);
34
public Snapshot getSnapshot();
35
}
36
```
37
38
### Usage Examples
39
40
**Basic Uniform Sampling:**
41
```java
42
// Default uniform reservoir (1028 samples)
43
Histogram responseTimeHistogram = new Histogram(new UniformReservoir());
44
registry.register("api.response.time", responseTimeHistogram);
45
46
// Custom size uniform reservoir
47
Histogram customHistogram = new Histogram(new UniformReservoir(500));
48
registry.register("custom.measurements", customHistogram);
49
```
50
51
**When to Use UniformReservoir:**
52
- When you need unbiased sampling across the entire data stream
53
- For long-running applications where historical data is as important as recent data
54
- When statistical accuracy across all time periods is critical
55
- For baseline measurements where time-based weighting isn't relevant
56
57
```java
58
// Example: Measuring file sizes across entire application lifetime
59
UniformReservoir filesSizes = new UniformReservoir(2000);
60
Histogram fileSizeHistogram = new Histogram(filesSizes);
61
62
// All file sizes have equal probability of being sampled
63
for (File file : getAllApplicationFiles()) {
64
fileSizeHistogram.update(file.length());
65
}
66
```
67
68
## ExponentiallyDecayingReservoir
69
70
`ExponentiallyDecayingReservoir` implements time-biased sampling that favors recent values over older ones using an exponential decay algorithm. This reservoir is ideal for applications where recent behavior is more indicative of current system state.
71
72
```java { .api }
73
public class ExponentiallyDecayingReservoir implements Reservoir {
74
// Constructors
75
public ExponentiallyDecayingReservoir(); // Default: 1028 samples, α=0.015
76
public ExponentiallyDecayingReservoir(int size, double alpha); // Custom size and decay factor
77
public ExponentiallyDecayingReservoir(int size, double alpha, Clock clock); // Custom clock for testing
78
79
// Reservoir interface implementation
80
public int size();
81
public void update(long value);
82
public Snapshot getSnapshot();
83
84
// Extended functionality
85
public void update(long value, long timestamp); // Update with explicit timestamp
86
}
87
```
88
89
### Usage Examples
90
91
**Default Time-Biased Sampling:**
92
```java
93
// Default exponentially decaying reservoir
94
Histogram requestLatency = new Histogram(new ExponentiallyDecayingReservoir());
95
Timer requestTimer = new Timer(new ExponentiallyDecayingReservoir());
96
97
// Recent values are more likely to be included in statistics
98
requestLatency.update(responseTime);
99
requestTimer.update(processingTime, TimeUnit.MILLISECONDS);
100
```
101
102
**Custom Decay Parameters:**
103
```java
104
// Faster decay (α=0.1) - emphasizes very recent values
105
ExponentiallyDecayingReservoir fastDecay = new ExponentiallyDecayingReservoir(1000, 0.1);
106
Histogram recentErrorRates = new Histogram(fastDecay);
107
108
// Slower decay (α=0.005) - maintains more historical influence
109
ExponentiallyDecayingReservoir slowDecay = new ExponentiallyDecayingReservoir(1000, 0.005);
110
Histogram backgroundProcessingTimes = new Histogram(slowDecay);
111
```
112
113
**Explicit Timestamp Updates:**
114
```java
115
ExponentiallyDecayingReservoir timestampedReservoir = new ExponentiallyDecayingReservoir();
116
117
// Update with explicit timestamp (useful for processing historical data)
118
long eventTimestamp = getEventTimestamp();
119
long eventValue = getEventValue();
120
timestampedReservoir.update(eventValue, eventTimestamp);
121
```
122
123
**Understanding Alpha (Decay Factor):**
124
```java
125
// α = 0.015 (default): Values lose ~50% weight after ~46 samples
126
// α = 0.1: Values lose ~50% weight after ~7 samples
127
// α = 0.001: Values lose ~50% weight after ~693 samples
128
129
// For high-frequency metrics (many updates per second)
130
ExponentiallyDecayingReservoir highFreq = new ExponentiallyDecayingReservoir(1000, 0.05);
131
132
// For low-frequency metrics (few updates per minute)
133
ExponentiallyDecayingReservoir lowFreq = new ExponentiallyDecayingReservoir(1000, 0.001);
134
```
135
136
## SlidingWindowReservoir
137
138
`SlidingWindowReservoir` maintains a fixed-size window of the most recent N values, providing a simple "last N samples" sampling strategy. When full, new values replace the oldest values in the window.
139
140
```java { .api }
141
public class SlidingWindowReservoir implements Reservoir {
142
// Constructor
143
public SlidingWindowReservoir(int size);
144
145
// Reservoir interface implementation
146
public int size();
147
public void update(long value);
148
public Snapshot getSnapshot();
149
}
150
```
151
152
### Usage Examples
153
154
**Fixed-Size Recent Sample Window:**
155
```java
156
// Keep last 100 response times
157
SlidingWindowReservoir recentResponses = new SlidingWindowReservoir(100);
158
Histogram responseHistogram = new Histogram(recentResponses);
159
160
// Keep last 50 error counts
161
SlidingWindowReservoir recentErrors = new SlidingWindowReservoir(50);
162
Histogram errorHistogram = new Histogram(recentErrors);
163
```
164
165
**When to Use SlidingWindowReservoir:**
166
- When you need statistics for exactly the last N values
167
- For debugging or development environments where recent behavior is most relevant
168
- When memory usage needs to be precisely controlled
169
- For systems with predictable, steady-state behavior
170
171
```java
172
// Example: Monitoring last 200 database query times
173
SlidingWindowReservoir queryTimes = new SlidingWindowReservoir(200);
174
Timer dbQueryTimer = new Timer(queryTimes);
175
176
// Statistics always represent the last 200 queries
177
for (Query query : queries) {
178
Timer.Context context = dbQueryTimer.time();
179
executeQuery(query);
180
context.stop();
181
}
182
183
// Get statistics for last 200 queries only
184
Snapshot snapshot = dbQueryTimer.getSnapshot();
185
double avgLast200 = snapshot.getMean();
186
```
187
188
## SlidingTimeWindowReservoir
189
190
`SlidingTimeWindowReservoir` maintains all values recorded within a specific time window (e.g., last 5 minutes), automatically discarding values that fall outside the time window. This provides time-based rather than count-based sampling.
191
192
```java { .api }
193
public class SlidingTimeWindowReservoir implements Reservoir {
194
// Constructors
195
public SlidingTimeWindowReservoir(long window, TimeUnit windowUnit);
196
public SlidingTimeWindowReservoir(long window, TimeUnit windowUnit, Clock clock);
197
198
// Reservoir interface implementation
199
public int size();
200
public void update(long value);
201
public Snapshot getSnapshot();
202
}
203
```
204
205
### Usage Examples
206
207
**Time-Based Sampling Windows:**
208
```java
209
// Keep all values from last 5 minutes
210
SlidingTimeWindowReservoir last5Minutes = new SlidingTimeWindowReservoir(5, TimeUnit.MINUTES);
211
Histogram recentActivity = new Histogram(last5Minutes);
212
213
// Keep all values from last 30 seconds
214
SlidingTimeWindowReservoir last30Seconds = new SlidingTimeWindowReservoir(30, TimeUnit.SECONDS);
215
Timer recentRequests = new Timer(last30Seconds);
216
```
217
218
**Real-Time Monitoring:**
219
```java
220
// Monitor error rates in real-time (last 1 minute)
221
SlidingTimeWindowReservoir errorWindow = new SlidingTimeWindowReservoir(1, TimeUnit.MINUTES);
222
Histogram errorRateHistogram = new Histogram(errorWindow);
223
224
// All statistics reflect only the last minute of activity
225
errorRateHistogram.update(errorCount);
226
Snapshot recentErrors = errorRateHistogram.getSnapshot();
227
System.out.println("Errors in last minute: " + recentErrors.size());
228
```
229
230
**Custom Clock for Testing:**
231
```java
232
// Controllable clock for testing time-based sampling
233
Clock testClock = new Clock() {
234
private long currentTime = 0;
235
public long getTick() { return currentTime * 1_000_000; }
236
public long getTime() { return currentTime; }
237
public void advance(long millis) { currentTime += millis; }
238
};
239
240
SlidingTimeWindowReservoir testReservoir =
241
new SlidingTimeWindowReservoir(10, TimeUnit.SECONDS, testClock);
242
243
// Add values and advance time for deterministic testing
244
testReservoir.update(100);
245
testClock.advance(5000); // Advance 5 seconds
246
testReservoir.update(200);
247
testClock.advance(6000); // Advance 6 more seconds (11 total)
248
249
// First value should be expired, only second value remains
250
assertEquals(1, testReservoir.size());
251
```
252
253
## Statistical Snapshots
254
255
All reservoirs provide statistical snapshots through the `Snapshot` class, which offers comprehensive statistical analysis of the sampled values.
256
257
```java { .api }
258
public abstract class Snapshot {
259
// Quantile access
260
public abstract double getValue(double quantile); // 0.0 to 1.0
261
public double getMedian(); // 50th percentile
262
public double get75thPercentile();
263
public double get95thPercentile();
264
public double get98thPercentile();
265
public double get99thPercentile();
266
public double get999thPercentile();
267
268
// Descriptive statistics
269
public abstract long[] getValues(); // All sampled values
270
public abstract int size(); // Number of values
271
public abstract long getMax();
272
public abstract long getMin();
273
public abstract double getMean();
274
public abstract double getStdDev();
275
276
// Utility
277
public abstract void dump(OutputStream output); // Export values
278
}
279
```
280
281
### Snapshot Usage Examples
282
283
```java
284
Histogram histogram = new Histogram(new ExponentiallyDecayingReservoir());
285
286
// Record some values
287
for (int i = 0; i < 1000; i++) {
288
histogram.update(random.nextInt(1000));
289
}
290
291
// Get comprehensive statistics
292
Snapshot snapshot = histogram.getSnapshot();
293
294
System.out.printf("Count: %d%n", snapshot.size());
295
System.out.printf("Min: %d, Max: %d%n", snapshot.getMin(), snapshot.getMax());
296
System.out.printf("Mean: %.2f, StdDev: %.2f%n", snapshot.getMean(), snapshot.getStdDev());
297
System.out.printf("Median: %.2f%n", snapshot.getMedian());
298
System.out.printf("95th percentile: %.2f%n", snapshot.get95thPercentile());
299
System.out.printf("99th percentile: %.2f%n", snapshot.get99thPercentile());
300
301
// Custom quantiles
302
double q90 = snapshot.getValue(0.90); // 90th percentile
303
double q999 = snapshot.getValue(0.999); // 99.9th percentile
304
305
// Export all values to file
306
try (FileOutputStream fos = new FileOutputStream("histogram-values.txt")) {
307
snapshot.dump(fos);
308
}
309
```
310
311
## Choosing the Right Reservoir
312
313
### Decision Matrix
314
315
| Use Case | Recommended Reservoir | Reason |
316
|----------|----------------------|---------|
317
| Long-running application metrics | `ExponentiallyDecayingReservoir` | Emphasizes recent behavior while maintaining historical context |
318
| Real-time monitoring dashboard | `SlidingTimeWindowReservoir` | Provides exact time-based windows for real-time analysis |
319
| Debug/development environments | `SlidingWindowReservoir` | Simple, predictable sampling of recent values |
320
| Baseline/benchmark measurements | `UniformReservoir` | Unbiased sampling across entire measurement period |
321
| High-frequency metrics | `ExponentiallyDecayingReservoir` with higher α | Faster decay to emphasize very recent values |
322
| Low-frequency metrics | `ExponentiallyDecayingReservoir` with lower α | Slower decay to maintain longer historical influence |
323
324
### Memory Usage Considerations
325
326
```java
327
// Memory usage comparison for different reservoirs:
328
329
// Fixed memory: 1028 * 8 bytes = ~8KB (plus small overhead)
330
UniformReservoir uniform = new UniformReservoir();
331
332
// Fixed memory: 1028 * (8 + 8) bytes = ~16KB (value + weight)
333
ExponentiallyDecayingReservoir exponential = new ExponentiallyDecayingReservoir();
334
335
// Fixed memory: 100 * 8 bytes = ~800 bytes
336
SlidingWindowReservoir sliding = new SlidingWindowReservoir(100);
337
338
// Variable memory: depends on activity in time window
339
SlidingTimeWindowReservoir timeWindow = new SlidingTimeWindowReservoir(1, TimeUnit.MINUTES);
340
// Could be 0 bytes (no activity) to unbounded (high activity)
341
```
342
343
### Performance Characteristics
344
345
```java
346
// Performance comparison (approximate):
347
348
// Fastest updates, good for high-frequency metrics
349
SlidingWindowReservoir fastest = new SlidingWindowReservoir(1000);
350
351
// Fast updates with periodic cleanup, good balance
352
ExponentiallyDecayingReservoir balanced = new ExponentiallyDecayingReservoir();
353
354
// Moderate performance, random access for updates
355
UniformReservoir moderate = new UniformReservoir();
356
357
// Slowest updates due to time-based cleanup, but accurate time windows
358
SlidingTimeWindowReservoir precise = new SlidingTimeWindowReservoir(5, TimeUnit.MINUTES);
359
```
360
361
## Advanced Usage
362
363
### Custom Reservoir Selection Strategy
364
365
```java
366
public class AdaptiveReservoirFactory {
367
public static Reservoir createReservoir(String metricName, MetricType type) {
368
if (metricName.contains("error") || metricName.contains("exception")) {
369
// For error metrics, emphasize recent events
370
return new ExponentiallyDecayingReservoir(500, 0.1);
371
} else if (metricName.contains("response.time")) {
372
// For response times, use time-based window for real-time monitoring
373
return new SlidingTimeWindowReservoir(2, TimeUnit.MINUTES);
374
} else if (metricName.contains("throughput")) {
375
// For throughput, use uniform sampling for unbiased measurement
376
return new UniformReservoir(2000);
377
} else {
378
// Default case
379
return new ExponentiallyDecayingReservoir();
380
}
381
}
382
}
383
384
// Usage
385
Histogram errorHistogram = new Histogram(
386
AdaptiveReservoirFactory.createReservoir("api.errors", MetricType.HISTOGRAM));
387
```
388
389
### Reservoir Monitoring
390
391
```java
392
// Monitor reservoir efficiency
393
public void analyzeReservoirEfficiency(Reservoir reservoir) {
394
Snapshot snapshot = reservoir.getSnapshot();
395
396
System.out.printf("Reservoir size: %d samples%n", reservoir.size());
397
System.out.printf("Statistical range: %d - %d%n",
398
snapshot.getMin(), snapshot.getMax());
399
System.out.printf("Distribution spread (std dev): %.2f%n",
400
snapshot.getStdDev());
401
402
// Check if reservoir is providing good statistical coverage
403
double range = snapshot.getMax() - snapshot.getMin();
404
double stdDevRatio = snapshot.getStdDev() / snapshot.getMean();
405
406
if (stdDevRatio > 1.0) {
407
System.out.println("High variability - consider larger reservoir");
408
}
409
if (range < snapshot.getMean() * 0.1) {
410
System.out.println("Low variability - smaller reservoir may suffice");
411
}
412
}
413
```
414
415
## Best Practices
416
417
### Reservoir Configuration
418
- Start with `ExponentiallyDecayingReservoir` as the default choice
419
- Use `SlidingTimeWindowReservoir` for real-time dashboards and alerting
420
- Choose reservoir size based on expected data variability and accuracy requirements
421
- Consider memory constraints when selecting reservoir types and sizes
422
423
### Performance Optimization
424
- Pre-allocate reservoirs during application startup
425
- Avoid creating new reservoirs frequently during runtime
426
- Monitor reservoir sizes in time-windowed reservoirs to detect memory issues
427
- Use appropriate α values for exponential decay based on update frequency
428
429
### Testing Strategies
430
- Use custom `Clock` implementations for deterministic testing of time-based reservoirs
431
- Test reservoir behavior under various load patterns (high frequency, bursts, sparse)
432
- Verify statistical accuracy by comparing reservoir snapshots with expected distributions
433
- Test memory usage patterns, especially with time-windowed reservoirs
434
435
### Monitoring and Debugging
436
- Log reservoir sizes periodically to understand sampling behavior
437
- Compare statistics from different reservoir types on the same data stream
438
- Monitor reservoir performance under production load patterns
439
- Use snapshot dumps for detailed analysis of sampled value distributions