0
# Serialization and I/O
1
2
Binary serialization support for Bloom filters and Count-Min sketches, enabling persistent storage and distributed computing scenarios. The serialization format is version-aware and designed for cross-platform compatibility.
3
4
## Capabilities
5
6
### Bloom Filter Serialization
7
8
Methods for serializing and deserializing Bloom filters.
9
10
```java { .api }
11
/**
12
* Writes the Bloom filter to an output stream in binary format
13
* Caller is responsible for closing the stream
14
* @param out output stream to write to
15
* @throws IOException if an I/O error occurs
16
*/
17
public abstract void writeTo(OutputStream out) throws IOException;
18
19
/**
20
* Reads a Bloom filter from an input stream
21
* Caller is responsible for closing the stream
22
* @param in input stream to read from
23
* @return deserialized BloomFilter instance
24
* @throws IOException if an I/O error occurs or format is invalid
25
*/
26
public static BloomFilter readFrom(InputStream in) throws IOException;
27
```
28
29
**Usage Examples:**
30
31
```java
32
import java.io.*;
33
34
// Create and populate a Bloom filter
35
BloomFilter filter = BloomFilter.create(1000, 0.01);
36
filter.put("item1");
37
filter.put("item2");
38
filter.put(12345L);
39
40
// Serialize to file
41
try (FileOutputStream fos = new FileOutputStream("bloomfilter.dat")) {
42
filter.writeTo(fos);
43
}
44
45
// Deserialize from file
46
BloomFilter loadedFilter;
47
try (FileInputStream fis = new FileInputStream("bloomfilter.dat")) {
48
loadedFilter = BloomFilter.readFrom(fis);
49
}
50
51
// Verify the loaded filter works correctly
52
boolean test1 = loadedFilter.mightContain("item1"); // true
53
boolean test2 = loadedFilter.mightContain("missing"); // false
54
```
55
56
### Count-Min Sketch Serialization
57
58
Methods for serializing and deserializing Count-Min sketches with both stream and byte array support.
59
60
```java { .api }
61
/**
62
* Writes the Count-Min sketch to an output stream in binary format
63
* Caller is responsible for closing the stream
64
* @param out output stream to write to
65
* @throws IOException if an I/O error occurs
66
*/
67
public abstract void writeTo(OutputStream out) throws IOException;
68
69
/**
70
* Serializes the Count-Min sketch to a byte array
71
* @return byte array containing serialized sketch data
72
* @throws IOException if serialization fails
73
*/
74
public abstract byte[] toByteArray() throws IOException;
75
76
/**
77
* Reads a Count-Min sketch from an input stream
78
* Caller is responsible for closing the stream
79
* @param in input stream to read from
80
* @return deserialized CountMinSketch instance
81
* @throws IOException if an I/O error occurs or format is invalid
82
*/
83
public static CountMinSketch readFrom(InputStream in) throws IOException;
84
85
/**
86
* Reads a Count-Min sketch from a byte array
87
* @param bytes byte array containing serialized sketch data
88
* @return deserialized CountMinSketch instance
89
* @throws IOException if deserialization fails
90
*/
91
public static CountMinSketch readFrom(byte[] bytes) throws IOException;
92
```
93
94
**Usage Examples:**
95
96
```java
97
import java.io.*;
98
99
// Create and populate a Count-Min sketch
100
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
101
sketch.add("user123", 10);
102
sketch.add("user456", 5);
103
sketch.addLong(999L, 3);
104
105
// Serialize to file using stream
106
try (FileOutputStream fos = new FileOutputStream("sketch.dat")) {
107
sketch.writeTo(fos);
108
}
109
110
// Serialize to byte array
111
byte[] sketchBytes = sketch.toByteArray();
112
113
// Deserialize from file
114
CountMinSketch loadedSketch;
115
try (FileInputStream fis = new FileInputStream("sketch.dat")) {
116
loadedSketch = CountMinSketch.readFrom(fis);
117
}
118
119
// Deserialize from byte array
120
CountMinSketch sketchFromBytes = CountMinSketch.readFrom(sketchBytes);
121
122
// Verify loaded sketches work correctly
123
long count1 = loadedSketch.estimateCount("user123"); // >= 10
124
long count2 = sketchFromBytes.estimateCount("user456"); // >= 5
125
long total = loadedSketch.totalCount(); // 18
126
```
127
128
### Binary Format Specifications
129
130
The serialization formats are version-aware and optimized for space efficiency.
131
132
#### Bloom Filter Binary Format (Version 1)
133
134
```java { .api }
135
// All values written in big-endian order:
136
// - Version number, always 1 (32 bit)
137
// - Number of hash functions (32 bit)
138
// - Total number of words of underlying bit array (32 bit)
139
// - The words/longs (numWords * 64 bit)
140
```
141
142
#### Count-Min Sketch Binary Format (Version 1)
143
144
```java { .api }
145
// All values written in big-endian order:
146
// - Version number, always 1 (32 bit)
147
// - Total count of added items (64 bit)
148
// - Depth (32 bit)
149
// - Width (32 bit)
150
// - Hash functions (depth * 64 bit)
151
// - Count table:
152
// - Row 0 (width * 64 bit)
153
// - Row 1 (width * 64 bit)
154
// - ...
155
// - Row depth-1 (width * 64 bit)
156
```
157
158
### Network and Distributed Computing Examples
159
160
Common patterns for using serialization in distributed environments.
161
162
**Network Transfer Example:**
163
164
```java
165
import java.io.*;
166
import java.net.*;
167
168
// Server: Send a Bloom filter over network
169
ServerSocket serverSocket = new ServerSocket(8080);
170
Socket clientSocket = serverSocket.accept();
171
172
BloomFilter filter = BloomFilter.create(10000, 0.01);
173
filter.put("shared_data");
174
175
try (OutputStream out = clientSocket.getOutputStream()) {
176
filter.writeTo(out);
177
}
178
179
// Client: Receive and use the Bloom filter
180
Socket socket = new Socket("localhost", 8080);
181
BloomFilter receivedFilter;
182
183
try (InputStream in = socket.getInputStream()) {
184
receivedFilter = BloomFilter.readFrom(in);
185
}
186
187
boolean contains = receivedFilter.mightContain("shared_data"); // true
188
```
189
190
**Distributed Aggregation Example:**
191
192
```java
193
// Scenario: Aggregate Count-Min sketches from multiple workers
194
195
// Worker 1
196
CountMinSketch worker1Sketch = CountMinSketch.create(0.01, 0.99, 42);
197
worker1Sketch.add("event_A", 100);
198
worker1Sketch.add("event_B", 50);
199
200
// Worker 2
201
CountMinSketch worker2Sketch = CountMinSketch.create(0.01, 0.99, 42);
202
worker2Sketch.add("event_A", 75);
203
worker2Sketch.add("event_C", 30);
204
205
// Serialize workers' sketches for network transfer
206
byte[] worker1Bytes = worker1Sketch.toByteArray();
207
byte[] worker2Bytes = worker2Sketch.toByteArray();
208
209
// Coordinator: Deserialize and merge
210
CountMinSketch aggregated = CountMinSketch.readFrom(worker1Bytes);
211
CountMinSketch worker2Copy = CountMinSketch.readFrom(worker2Bytes);
212
213
aggregated.mergeInPlace(worker2Copy);
214
215
// Now aggregated contains combined counts
216
long eventA = aggregated.estimateCount("event_A"); // >= 175 (100 + 75)
217
long eventB = aggregated.estimateCount("event_B"); // >= 50
218
long eventC = aggregated.estimateCount("event_C"); // >= 30
219
```
220
221
### Java Serialization Support
222
223
Both data structures also implement Java's native serialization for integration with frameworks that use `ObjectOutputStream`/`ObjectInputStream`.
224
225
```java { .api }
226
// These methods are automatically called during Java serialization
227
private void writeObject(ObjectOutputStream out) throws IOException;
228
private void readObject(ObjectInputStream in) throws IOException;
229
```
230
231
**Usage Example:**
232
233
```java
234
import java.io.*;
235
236
BloomFilter filter = BloomFilter.create(1000);
237
filter.put("test");
238
239
// Java serialization
240
try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("filter.ser"))) {
241
oos.writeObject(filter);
242
}
243
244
// Java deserialization
245
BloomFilter deserializedFilter;
246
try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("filter.ser"))) {
247
deserializedFilter = (BloomFilter) ois.readObject();
248
}
249
```
250
251
## Performance Characteristics
252
253
- **Bloom Filter**: Serialization size is proportional to bit count (typically much smaller than storing actual items)
254
- **Count-Min Sketch**: Serialization size is `depth × width × 8 bytes` plus small header overhead
255
- **Network Efficiency**: Binary format is more compact than JSON/XML alternatives
256
- **Version Compatibility**: Forward and backward compatibility maintained through version headers
257
258
## Error Handling
259
260
- `IOException`: Thrown for I/O errors during read/write operations
261
- `IOException` with specific message: Thrown for version incompatibility or corrupted data
262
- Stream management: Callers are responsible for properly closing streams to avoid resource leaks