Binary serialization support for Bloom filters and Count-Min sketches, enabling persistent storage and distributed computing scenarios. The serialization format is version-aware and designed for cross-platform compatibility.
Methods for serializing and deserializing Bloom filters.
/**
* Writes the Bloom filter to an output stream in binary format
* Caller is responsible for closing the stream
* @param out output stream to write to
* @throws IOException if an I/O error occurs
*/
public abstract void writeTo(OutputStream out) throws IOException;
/**
* Reads a Bloom filter from an input stream
* Caller is responsible for closing the stream
* @param in input stream to read from
* @return deserialized BloomFilter instance
* @throws IOException if an I/O error occurs or format is invalid
*/
public static BloomFilter readFrom(InputStream in) throws IOException;Usage Examples:
import java.io.*;
// Create and populate a Bloom filter
BloomFilter filter = BloomFilter.create(1000, 0.01);
filter.put("item1");
filter.put("item2");
filter.put(12345L);
// Serialize to file
try (FileOutputStream fos = new FileOutputStream("bloomfilter.dat")) {
filter.writeTo(fos);
}
// Deserialize from file
BloomFilter loadedFilter;
try (FileInputStream fis = new FileInputStream("bloomfilter.dat")) {
loadedFilter = BloomFilter.readFrom(fis);
}
// Verify the loaded filter works correctly
boolean test1 = loadedFilter.mightContain("item1"); // true
boolean test2 = loadedFilter.mightContain("missing"); // falseMethods for serializing and deserializing Count-Min sketches with both stream and byte array support.
/**
* Writes the Count-Min sketch to an output stream in binary format
* Caller is responsible for closing the stream
* @param out output stream to write to
* @throws IOException if an I/O error occurs
*/
public abstract void writeTo(OutputStream out) throws IOException;
/**
* Serializes the Count-Min sketch to a byte array
* @return byte array containing serialized sketch data
* @throws IOException if serialization fails
*/
public abstract byte[] toByteArray() throws IOException;
/**
* Reads a Count-Min sketch from an input stream
* Caller is responsible for closing the stream
* @param in input stream to read from
* @return deserialized CountMinSketch instance
* @throws IOException if an I/O error occurs or format is invalid
*/
public static CountMinSketch readFrom(InputStream in) throws IOException;
/**
* Reads a Count-Min sketch from a byte array
* @param bytes byte array containing serialized sketch data
* @return deserialized CountMinSketch instance
* @throws IOException if deserialization fails
*/
public static CountMinSketch readFrom(byte[] bytes) throws IOException;Usage Examples:
import java.io.*;
// Create and populate a Count-Min sketch
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
sketch.add("user123", 10);
sketch.add("user456", 5);
sketch.addLong(999L, 3);
// Serialize to file using stream
try (FileOutputStream fos = new FileOutputStream("sketch.dat")) {
sketch.writeTo(fos);
}
// Serialize to byte array
byte[] sketchBytes = sketch.toByteArray();
// Deserialize from file
CountMinSketch loadedSketch;
try (FileInputStream fis = new FileInputStream("sketch.dat")) {
loadedSketch = CountMinSketch.readFrom(fis);
}
// Deserialize from byte array
CountMinSketch sketchFromBytes = CountMinSketch.readFrom(sketchBytes);
// Verify loaded sketches work correctly
long count1 = loadedSketch.estimateCount("user123"); // >= 10
long count2 = sketchFromBytes.estimateCount("user456"); // >= 5
long total = loadedSketch.totalCount(); // 18The serialization formats are version-aware and optimized for space efficiency.
// All values written in big-endian order:
// - Version number, always 1 (32 bit)
// - Number of hash functions (32 bit)
// - Total number of words of underlying bit array (32 bit)
// - The words/longs (numWords * 64 bit)// All values written in big-endian order:
// - Version number, always 1 (32 bit)
// - Total count of added items (64 bit)
// - Depth (32 bit)
// - Width (32 bit)
// - Hash functions (depth * 64 bit)
// - Count table:
// - Row 0 (width * 64 bit)
// - Row 1 (width * 64 bit)
// - ...
// - Row depth-1 (width * 64 bit)Common patterns for using serialization in distributed environments.
Network Transfer Example:
import java.io.*;
import java.net.*;
// Server: Send a Bloom filter over network
ServerSocket serverSocket = new ServerSocket(8080);
Socket clientSocket = serverSocket.accept();
BloomFilter filter = BloomFilter.create(10000, 0.01);
filter.put("shared_data");
try (OutputStream out = clientSocket.getOutputStream()) {
filter.writeTo(out);
}
// Client: Receive and use the Bloom filter
Socket socket = new Socket("localhost", 8080);
BloomFilter receivedFilter;
try (InputStream in = socket.getInputStream()) {
receivedFilter = BloomFilter.readFrom(in);
}
boolean contains = receivedFilter.mightContain("shared_data"); // trueDistributed Aggregation Example:
// Scenario: Aggregate Count-Min sketches from multiple workers
// Worker 1
CountMinSketch worker1Sketch = CountMinSketch.create(0.01, 0.99, 42);
worker1Sketch.add("event_A", 100);
worker1Sketch.add("event_B", 50);
// Worker 2
CountMinSketch worker2Sketch = CountMinSketch.create(0.01, 0.99, 42);
worker2Sketch.add("event_A", 75);
worker2Sketch.add("event_C", 30);
// Serialize workers' sketches for network transfer
byte[] worker1Bytes = worker1Sketch.toByteArray();
byte[] worker2Bytes = worker2Sketch.toByteArray();
// Coordinator: Deserialize and merge
CountMinSketch aggregated = CountMinSketch.readFrom(worker1Bytes);
CountMinSketch worker2Copy = CountMinSketch.readFrom(worker2Bytes);
aggregated.mergeInPlace(worker2Copy);
// Now aggregated contains combined counts
long eventA = aggregated.estimateCount("event_A"); // >= 175 (100 + 75)
long eventB = aggregated.estimateCount("event_B"); // >= 50
long eventC = aggregated.estimateCount("event_C"); // >= 30Both data structures also implement Java's native serialization for integration with frameworks that use ObjectOutputStream/ObjectInputStream.
// These methods are automatically called during Java serialization
private void writeObject(ObjectOutputStream out) throws IOException;
private void readObject(ObjectInputStream in) throws IOException;Usage Example:
import java.io.*;
BloomFilter filter = BloomFilter.create(1000);
filter.put("test");
// Java serialization
try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("filter.ser"))) {
oos.writeObject(filter);
}
// Java deserialization
BloomFilter deserializedFilter;
try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("filter.ser"))) {
deserializedFilter = (BloomFilter) ois.readObject();
}depth × width × 8 bytes plus small header overheadIOException: Thrown for I/O errors during read/write operationsIOException with specific message: Thrown for version incompatibility or corrupted data