or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bloom-filter.mdcount-min-sketch.mdindex.mdserialization.md
tile.json

serialization.mddocs/

Serialization and I/O

Binary serialization support for Bloom filters and Count-Min sketches, enabling persistent storage and distributed computing scenarios. The serialization format is version-aware and designed for cross-platform compatibility.

Capabilities

Bloom Filter Serialization

Methods for serializing and deserializing Bloom filters.

/**
 * Writes the Bloom filter to an output stream in binary format
 * Caller is responsible for closing the stream
 * @param out output stream to write to
 * @throws IOException if an I/O error occurs
 */
public abstract void writeTo(OutputStream out) throws IOException;

/**
 * Reads a Bloom filter from an input stream
 * Caller is responsible for closing the stream
 * @param in input stream to read from
 * @return deserialized BloomFilter instance
 * @throws IOException if an I/O error occurs or format is invalid
 */
public static BloomFilter readFrom(InputStream in) throws IOException;

Usage Examples:

import java.io.*;

// Create and populate a Bloom filter
BloomFilter filter = BloomFilter.create(1000, 0.01);
filter.put("item1");
filter.put("item2");
filter.put(12345L);

// Serialize to file
try (FileOutputStream fos = new FileOutputStream("bloomfilter.dat")) {
    filter.writeTo(fos);
}

// Deserialize from file
BloomFilter loadedFilter;
try (FileInputStream fis = new FileInputStream("bloomfilter.dat")) {
    loadedFilter = BloomFilter.readFrom(fis);
}

// Verify the loaded filter works correctly
boolean test1 = loadedFilter.mightContain("item1");    // true
boolean test2 = loadedFilter.mightContain("missing");   // false

Count-Min Sketch Serialization

Methods for serializing and deserializing Count-Min sketches with both stream and byte array support.

/**
 * Writes the Count-Min sketch to an output stream in binary format
 * Caller is responsible for closing the stream
 * @param out output stream to write to
 * @throws IOException if an I/O error occurs
 */
public abstract void writeTo(OutputStream out) throws IOException;

/**
 * Serializes the Count-Min sketch to a byte array
 * @return byte array containing serialized sketch data
 * @throws IOException if serialization fails
 */
public abstract byte[] toByteArray() throws IOException;

/**
 * Reads a Count-Min sketch from an input stream
 * Caller is responsible for closing the stream
 * @param in input stream to read from
 * @return deserialized CountMinSketch instance
 * @throws IOException if an I/O error occurs or format is invalid
 */
public static CountMinSketch readFrom(InputStream in) throws IOException;

/**
 * Reads a Count-Min sketch from a byte array
 * @param bytes byte array containing serialized sketch data
 * @return deserialized CountMinSketch instance  
 * @throws IOException if deserialization fails
 */
public static CountMinSketch readFrom(byte[] bytes) throws IOException;

Usage Examples:

import java.io.*;

// Create and populate a Count-Min sketch
CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);
sketch.add("user123", 10);
sketch.add("user456", 5);
sketch.addLong(999L, 3);

// Serialize to file using stream
try (FileOutputStream fos = new FileOutputStream("sketch.dat")) {
    sketch.writeTo(fos);
}

// Serialize to byte array
byte[] sketchBytes = sketch.toByteArray();

// Deserialize from file
CountMinSketch loadedSketch;
try (FileInputStream fis = new FileInputStream("sketch.dat")) {
    loadedSketch = CountMinSketch.readFrom(fis);
}

// Deserialize from byte array
CountMinSketch sketchFromBytes = CountMinSketch.readFrom(sketchBytes);

// Verify loaded sketches work correctly
long count1 = loadedSketch.estimateCount("user123");      // >= 10
long count2 = sketchFromBytes.estimateCount("user456");   // >= 5
long total = loadedSketch.totalCount();                   // 18

Binary Format Specifications

The serialization formats are version-aware and optimized for space efficiency.

Bloom Filter Binary Format (Version 1)

// All values written in big-endian order:
// - Version number, always 1 (32 bit)
// - Number of hash functions (32 bit)  
// - Total number of words of underlying bit array (32 bit)
// - The words/longs (numWords * 64 bit)

Count-Min Sketch Binary Format (Version 1)

// All values written in big-endian order:
// - Version number, always 1 (32 bit)
// - Total count of added items (64 bit)
// - Depth (32 bit)
// - Width (32 bit)
// - Hash functions (depth * 64 bit)
// - Count table:
//   - Row 0 (width * 64 bit)
//   - Row 1 (width * 64 bit)
//   - ...
//   - Row depth-1 (width * 64 bit)

Network and Distributed Computing Examples

Common patterns for using serialization in distributed environments.

Network Transfer Example:

import java.io.*;
import java.net.*;

// Server: Send a Bloom filter over network
ServerSocket serverSocket = new ServerSocket(8080);
Socket clientSocket = serverSocket.accept();

BloomFilter filter = BloomFilter.create(10000, 0.01);
filter.put("shared_data");

try (OutputStream out = clientSocket.getOutputStream()) {
    filter.writeTo(out);
}

// Client: Receive and use the Bloom filter
Socket socket = new Socket("localhost", 8080);
BloomFilter receivedFilter;

try (InputStream in = socket.getInputStream()) {
    receivedFilter = BloomFilter.readFrom(in);
}

boolean contains = receivedFilter.mightContain("shared_data"); // true

Distributed Aggregation Example:

// Scenario: Aggregate Count-Min sketches from multiple workers

// Worker 1
CountMinSketch worker1Sketch = CountMinSketch.create(0.01, 0.99, 42);
worker1Sketch.add("event_A", 100);
worker1Sketch.add("event_B", 50);

// Worker 2  
CountMinSketch worker2Sketch = CountMinSketch.create(0.01, 0.99, 42);
worker2Sketch.add("event_A", 75);
worker2Sketch.add("event_C", 30);

// Serialize workers' sketches for network transfer
byte[] worker1Bytes = worker1Sketch.toByteArray();
byte[] worker2Bytes = worker2Sketch.toByteArray();

// Coordinator: Deserialize and merge
CountMinSketch aggregated = CountMinSketch.readFrom(worker1Bytes);
CountMinSketch worker2Copy = CountMinSketch.readFrom(worker2Bytes);

aggregated.mergeInPlace(worker2Copy);

// Now aggregated contains combined counts
long eventA = aggregated.estimateCount("event_A"); // >= 175 (100 + 75)
long eventB = aggregated.estimateCount("event_B"); // >= 50
long eventC = aggregated.estimateCount("event_C"); // >= 30

Java Serialization Support

Both data structures also implement Java's native serialization for integration with frameworks that use ObjectOutputStream/ObjectInputStream.

// These methods are automatically called during Java serialization
private void writeObject(ObjectOutputStream out) throws IOException;
private void readObject(ObjectInputStream in) throws IOException;

Usage Example:

import java.io.*;

BloomFilter filter = BloomFilter.create(1000);
filter.put("test");

// Java serialization
try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("filter.ser"))) {
    oos.writeObject(filter);
}

// Java deserialization
BloomFilter deserializedFilter;
try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("filter.ser"))) {
    deserializedFilter = (BloomFilter) ois.readObject();
}

Performance Characteristics

  • Bloom Filter: Serialization size is proportional to bit count (typically much smaller than storing actual items)
  • Count-Min Sketch: Serialization size is depth × width × 8 bytes plus small header overhead
  • Network Efficiency: Binary format is more compact than JSON/XML alternatives
  • Version Compatibility: Forward and backward compatibility maintained through version headers

Error Handling

  • IOException: Thrown for I/O errors during read/write operations
  • IOException with specific message: Thrown for version incompatibility or corrupted data
  • Stream management: Callers are responsible for properly closing streams to avoid resource leaks