or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bloom-filter.mdcount-min-sketch.mdindex.mdserialization.md

serialization.mddocs/

0

# Serialization and I/O

1

2

Binary serialization support for Bloom filters and Count-Min sketches, enabling persistent storage and distributed computing scenarios. The serialization format is version-aware and designed for cross-platform compatibility.

3

4

## Capabilities

5

6

### Bloom Filter Serialization

7

8

Methods for serializing and deserializing Bloom filters.

9

10

```java { .api }

11

/**

12

* Writes the Bloom filter to an output stream in binary format

13

* Caller is responsible for closing the stream

14

* @param out output stream to write to

15

* @throws IOException if an I/O error occurs

16

*/

17

public abstract void writeTo(OutputStream out) throws IOException;

18

19

/**

20

* Reads a Bloom filter from an input stream

21

* Caller is responsible for closing the stream

22

* @param in input stream to read from

23

* @return deserialized BloomFilter instance

24

* @throws IOException if an I/O error occurs or format is invalid

25

*/

26

public static BloomFilter readFrom(InputStream in) throws IOException;

27

```

28

29

**Usage Examples:**

30

31

```java

32

import java.io.*;

33

34

// Create and populate a Bloom filter

35

BloomFilter filter = BloomFilter.create(1000, 0.01);

36

filter.put("item1");

37

filter.put("item2");

38

filter.put(12345L);

39

40

// Serialize to file

41

try (FileOutputStream fos = new FileOutputStream("bloomfilter.dat")) {

42

filter.writeTo(fos);

43

}

44

45

// Deserialize from file

46

BloomFilter loadedFilter;

47

try (FileInputStream fis = new FileInputStream("bloomfilter.dat")) {

48

loadedFilter = BloomFilter.readFrom(fis);

49

}

50

51

// Verify the loaded filter works correctly

52

boolean test1 = loadedFilter.mightContain("item1"); // true

53

boolean test2 = loadedFilter.mightContain("missing"); // false

54

```

55

56

### Count-Min Sketch Serialization

57

58

Methods for serializing and deserializing Count-Min sketches with both stream and byte array support.

59

60

```java { .api }

61

/**

62

* Writes the Count-Min sketch to an output stream in binary format

63

* Caller is responsible for closing the stream

64

* @param out output stream to write to

65

* @throws IOException if an I/O error occurs

66

*/

67

public abstract void writeTo(OutputStream out) throws IOException;

68

69

/**

70

* Serializes the Count-Min sketch to a byte array

71

* @return byte array containing serialized sketch data

72

* @throws IOException if serialization fails

73

*/

74

public abstract byte[] toByteArray() throws IOException;

75

76

/**

77

* Reads a Count-Min sketch from an input stream

78

* Caller is responsible for closing the stream

79

* @param in input stream to read from

80

* @return deserialized CountMinSketch instance

81

* @throws IOException if an I/O error occurs or format is invalid

82

*/

83

public static CountMinSketch readFrom(InputStream in) throws IOException;

84

85

/**

86

* Reads a Count-Min sketch from a byte array

87

* @param bytes byte array containing serialized sketch data

88

* @return deserialized CountMinSketch instance

89

* @throws IOException if deserialization fails

90

*/

91

public static CountMinSketch readFrom(byte[] bytes) throws IOException;

92

```

93

94

**Usage Examples:**

95

96

```java

97

import java.io.*;

98

99

// Create and populate a Count-Min sketch

100

CountMinSketch sketch = CountMinSketch.create(0.01, 0.99, 42);

101

sketch.add("user123", 10);

102

sketch.add("user456", 5);

103

sketch.addLong(999L, 3);

104

105

// Serialize to file using stream

106

try (FileOutputStream fos = new FileOutputStream("sketch.dat")) {

107

sketch.writeTo(fos);

108

}

109

110

// Serialize to byte array

111

byte[] sketchBytes = sketch.toByteArray();

112

113

// Deserialize from file

114

CountMinSketch loadedSketch;

115

try (FileInputStream fis = new FileInputStream("sketch.dat")) {

116

loadedSketch = CountMinSketch.readFrom(fis);

117

}

118

119

// Deserialize from byte array

120

CountMinSketch sketchFromBytes = CountMinSketch.readFrom(sketchBytes);

121

122

// Verify loaded sketches work correctly

123

long count1 = loadedSketch.estimateCount("user123"); // >= 10

124

long count2 = sketchFromBytes.estimateCount("user456"); // >= 5

125

long total = loadedSketch.totalCount(); // 18

126

```

127

128

### Binary Format Specifications

129

130

The serialization formats are version-aware and optimized for space efficiency.

131

132

#### Bloom Filter Binary Format (Version 1)

133

134

```java { .api }

135

// All values written in big-endian order:

136

// - Version number, always 1 (32 bit)

137

// - Number of hash functions (32 bit)

138

// - Total number of words of underlying bit array (32 bit)

139

// - The words/longs (numWords * 64 bit)

140

```

141

142

#### Count-Min Sketch Binary Format (Version 1)

143

144

```java { .api }

145

// All values written in big-endian order:

146

// - Version number, always 1 (32 bit)

147

// - Total count of added items (64 bit)

148

// - Depth (32 bit)

149

// - Width (32 bit)

150

// - Hash functions (depth * 64 bit)

151

// - Count table:

152

// - Row 0 (width * 64 bit)

153

// - Row 1 (width * 64 bit)

154

// - ...

155

// - Row depth-1 (width * 64 bit)

156

```

157

158

### Network and Distributed Computing Examples

159

160

Common patterns for using serialization in distributed environments.

161

162

**Network Transfer Example:**

163

164

```java

165

import java.io.*;

166

import java.net.*;

167

168

// Server: Send a Bloom filter over network

169

ServerSocket serverSocket = new ServerSocket(8080);

170

Socket clientSocket = serverSocket.accept();

171

172

BloomFilter filter = BloomFilter.create(10000, 0.01);

173

filter.put("shared_data");

174

175

try (OutputStream out = clientSocket.getOutputStream()) {

176

filter.writeTo(out);

177

}

178

179

// Client: Receive and use the Bloom filter

180

Socket socket = new Socket("localhost", 8080);

181

BloomFilter receivedFilter;

182

183

try (InputStream in = socket.getInputStream()) {

184

receivedFilter = BloomFilter.readFrom(in);

185

}

186

187

boolean contains = receivedFilter.mightContain("shared_data"); // true

188

```

189

190

**Distributed Aggregation Example:**

191

192

```java

193

// Scenario: Aggregate Count-Min sketches from multiple workers

194

195

// Worker 1

196

CountMinSketch worker1Sketch = CountMinSketch.create(0.01, 0.99, 42);

197

worker1Sketch.add("event_A", 100);

198

worker1Sketch.add("event_B", 50);

199

200

// Worker 2

201

CountMinSketch worker2Sketch = CountMinSketch.create(0.01, 0.99, 42);

202

worker2Sketch.add("event_A", 75);

203

worker2Sketch.add("event_C", 30);

204

205

// Serialize workers' sketches for network transfer

206

byte[] worker1Bytes = worker1Sketch.toByteArray();

207

byte[] worker2Bytes = worker2Sketch.toByteArray();

208

209

// Coordinator: Deserialize and merge

210

CountMinSketch aggregated = CountMinSketch.readFrom(worker1Bytes);

211

CountMinSketch worker2Copy = CountMinSketch.readFrom(worker2Bytes);

212

213

aggregated.mergeInPlace(worker2Copy);

214

215

// Now aggregated contains combined counts

216

long eventA = aggregated.estimateCount("event_A"); // >= 175 (100 + 75)

217

long eventB = aggregated.estimateCount("event_B"); // >= 50

218

long eventC = aggregated.estimateCount("event_C"); // >= 30

219

```

220

221

### Java Serialization Support

222

223

Both data structures also implement Java's native serialization for integration with frameworks that use `ObjectOutputStream`/`ObjectInputStream`.

224

225

```java { .api }

226

// These methods are automatically called during Java serialization

227

private void writeObject(ObjectOutputStream out) throws IOException;

228

private void readObject(ObjectInputStream in) throws IOException;

229

```

230

231

**Usage Example:**

232

233

```java

234

import java.io.*;

235

236

BloomFilter filter = BloomFilter.create(1000);

237

filter.put("test");

238

239

// Java serialization

240

try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("filter.ser"))) {

241

oos.writeObject(filter);

242

}

243

244

// Java deserialization

245

BloomFilter deserializedFilter;

246

try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("filter.ser"))) {

247

deserializedFilter = (BloomFilter) ois.readObject();

248

}

249

```

250

251

## Performance Characteristics

252

253

- **Bloom Filter**: Serialization size is proportional to bit count (typically much smaller than storing actual items)

254

- **Count-Min Sketch**: Serialization size is `depth × width × 8 bytes` plus small header overhead

255

- **Network Efficiency**: Binary format is more compact than JSON/XML alternatives

256

- **Version Compatibility**: Forward and backward compatibility maintained through version headers

257

258

## Error Handling

259

260

- `IOException`: Thrown for I/O errors during read/write operations

261

- `IOException` with specific message: Thrown for version incompatibility or corrupted data

262

- Stream management: Callers are responsible for properly closing streams to avoid resource leaks