or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

automaton-construction.mddictionary-interface.mdindex.mdpattern-search.mdserialization.md

serialization.mddocs/

0

# Serialization

1

2

Save and load automaton instances to/from disk with support for custom serialization functions for arbitrary object storage and efficient built-in serialization for integer storage.

3

4

## Capabilities

5

6

### Save Automaton

7

8

Save an automaton to disk for later reuse.

9

10

```python { .api }

11

def save(self, path, serializer=None):

12

"""

13

Save content of automaton to a file on disk.

14

15

Parameters:

16

- path: File path to save to

17

- serializer: Callable for converting Python objects to bytes

18

(required for STORE_ANY, not used for STORE_INTS/STORE_LENGTH)

19

20

Raises:

21

- ValueError: If serializer required but not provided

22

- IOError: If file cannot be written

23

"""

24

```

25

26

#### Usage Examples

27

28

```python

29

import ahocorasick

30

import pickle

31

32

# STORE_ANY - requires serializer

33

automaton = ahocorasick.Automaton(ahocorasick.STORE_ANY)

34

automaton.add_word('hello', {'type': 'greeting', 'lang': 'en'})

35

automaton.add_word('world', {'type': 'noun', 'meaning': 'earth'})

36

automaton.make_automaton()

37

38

# Save with pickle serializer

39

automaton.save('my_automaton.dat', pickle.dumps)

40

41

# STORE_INTS - no serializer needed

42

int_automaton = ahocorasick.Automaton(ahocorasick.STORE_INTS)

43

int_automaton.add_word('cat', 1)

44

int_automaton.add_word('dog', 2)

45

int_automaton.make_automaton()

46

47

# Save without serializer

48

int_automaton.save('int_automaton.dat')

49

50

# STORE_LENGTH - no serializer needed

51

length_automaton = ahocorasick.Automaton(ahocorasick.STORE_LENGTH)

52

length_automaton.add_word('apple') # value = 5

53

length_automaton.add_word('orange') # value = 6

54

length_automaton.make_automaton()

55

56

# Save without serializer

57

length_automaton.save('length_automaton.dat')

58

```

59

60

### Load Automaton

61

62

Load a previously saved automaton from disk.

63

64

```python { .api }

65

def ahocorasick.load(path, deserializer=None):

66

"""

67

Load automaton previously stored on disk using save method.

68

69

Parameters:

70

- path: File path to load from

71

- deserializer: Callable for converting bytes back to Python objects

72

(required for STORE_ANY automatons, not used for others)

73

74

Returns:

75

Automaton: Loaded automaton instance ready for use

76

77

Raises:

78

- ValueError: If deserializer required but not provided

79

- IOError: If file cannot be read

80

- PickleError: If deserialization fails

81

"""

82

```

83

84

#### Usage Examples

85

86

```python

87

import ahocorasick

88

import pickle

89

90

# Load STORE_ANY automaton - requires deserializer

91

loaded_automaton = ahocorasick.load('my_automaton.dat', pickle.loads)

92

93

# Verify it works

94

print(loaded_automaton.get('hello')) # {'type': 'greeting', 'lang': 'en'}

95

text = "hello world"

96

matches = list(loaded_automaton.iter(text))

97

print(matches)

98

99

# Load STORE_INTS automaton - no deserializer needed

100

int_automaton = ahocorasick.load('int_automaton.dat')

101

print(int_automaton.get('cat')) # 1

102

print(int_automaton.get('dog')) # 2

103

104

# Load STORE_LENGTH automaton - no deserializer needed

105

length_automaton = ahocorasick.load('length_automaton.dat')

106

print(length_automaton.get('apple')) # 5

107

print(length_automaton.get('orange')) # 6

108

```

109

110

### Pickle Support

111

112

Automatons support Python's standard pickle module for serialization.

113

114

```python { .api }

115

def __reduce__(self):

116

"""

117

Return pickle-able data for this automaton instance.

118

119

Returns:

120

tuple: Data needed to reconstruct the automaton

121

122

Usage:

123

This method enables standard pickle.dump() and pickle.load() operations.

124

"""

125

```

126

127

#### Usage Examples

128

129

```python

130

import ahocorasick

131

import pickle

132

133

# Create and populate automaton

134

automaton = ahocorasick.Automaton()

135

words = ['the', 'quick', 'brown', 'fox']

136

for i, word in enumerate(words):

137

automaton.add_word(word, i)

138

automaton.make_automaton()

139

140

# Pickle to bytes

141

pickled_data = pickle.dumps(automaton)

142

143

# Unpickle from bytes

144

restored_automaton = pickle.loads(pickled_data)

145

146

# Verify functionality

147

print(restored_automaton.get('quick')) # 1

148

matches = list(restored_automaton.iter('the quick brown fox'))

149

print(len(matches)) # 4

150

151

# Pickle to file

152

with open('automaton.pickle', 'wb') as f:

153

pickle.dump(automaton, f)

154

155

# Unpickle from file

156

with open('automaton.pickle', 'rb') as f:

157

file_automaton = pickle.load(f)

158

159

print(file_automaton.get('fox')) # 3

160

```

161

162

## Serialization Methods Comparison

163

164

### Custom save/load vs Pickle

165

166

| Feature | save/load | pickle |

167

|---------|-----------|--------|

168

| **Performance** | Faster for large automatons | Slower, more overhead |

169

| **File Size** | Smaller files | Larger files |

170

| **Portability** | pyahocorasick specific | Standard Python |

171

| **Flexibility** | Custom serializers | Full object graph |

172

| **Memory Usage** | Lower during operation | Higher during operation |

173

174

### Storage Type Considerations

175

176

| Storage Type | save/load Serializer | pickle Support | Notes |

177

|--------------|---------------------|----------------|-------|

178

| **STORE_INTS** | Not required | Yes | Most efficient |

179

| **STORE_LENGTH** | Not required | Yes | Very efficient |

180

| **STORE_ANY** | Required | Yes | Depends on object complexity |

181

182

## Advanced Serialization Patterns

183

184

### Custom Serialization for Complex Objects

185

186

```python

187

import ahocorasick

188

import json

189

import pickle

190

191

class CustomSerializer:

192

"""Custom serializer for complex objects."""

193

194

@staticmethod

195

def serialize(obj):

196

"""Convert object to bytes."""

197

if isinstance(obj, dict):

198

return json.dumps(obj).encode('utf-8')

199

else:

200

return pickle.dumps(obj)

201

202

@staticmethod

203

def deserialize(data):

204

"""Convert bytes back to object."""

205

try:

206

# Try JSON first

207

return json.loads(data.decode('utf-8'))

208

except (UnicodeDecodeError, json.JSONDecodeError):

209

# Fall back to pickle

210

return pickle.loads(data)

211

212

# Usage

213

automaton = ahocorasick.Automaton()

214

automaton.add_word('config', {'host': 'localhost', 'port': 8080})

215

automaton.add_word('data', [1, 2, 3, 4, 5])

216

automaton.make_automaton()

217

218

# Save with custom serializer

219

automaton.save('custom.dat', CustomSerializer.serialize)

220

221

# Load with custom deserializer

222

loaded = ahocorasick.load('custom.dat', CustomSerializer.deserialize)

223

print(loaded.get('config')) # {'host': 'localhost', 'port': 8080}

224

```

225

226

### Conditional Serialization

227

228

```python

229

def conditional_serializer(obj):

230

"""Serialize only certain types of objects."""

231

if isinstance(obj, (str, int, float, bool)):

232

return pickle.dumps(obj)

233

elif isinstance(obj, dict) and all(isinstance(k, str) for k in obj.keys()):

234

return json.dumps(obj).encode('utf-8')

235

else:

236

raise ValueError(f"Cannot serialize object of type {type(obj)}")

237

238

def conditional_deserializer(data):

239

"""Deserialize with type detection."""

240

try:

241

return json.loads(data.decode('utf-8'))

242

except:

243

return pickle.loads(data)

244

```

245

246

### Compression Support

247

248

```python

249

import gzip

250

import pickle

251

252

def compressed_save(automaton, path):

253

"""Save automaton with compression."""

254

with gzip.open(path, 'wb') as f:

255

pickle.dump(automaton, f)

256

257

def compressed_load(path):

258

"""Load compressed automaton."""

259

with gzip.open(path, 'rb') as f:

260

return pickle.load(f)

261

262

# Usage

263

automaton = ahocorasick.Automaton()

264

# ... populate automaton ...

265

compressed_save(automaton, 'compressed_automaton.pkl.gz')

266

loaded = compressed_load('compressed_automaton.pkl.gz')

267

```

268

269

### Version-aware Serialization

270

271

```python

272

import ahocorasick

273

import pickle

274

275

class VersionedAutomaton:

276

"""Wrapper that adds version information."""

277

278

VERSION = "1.0"

279

280

def __init__(self, automaton):

281

self.version = self.VERSION

282

self.automaton = automaton

283

284

def save(self, path):

285

"""Save with version info."""

286

data = {

287

'version': self.version,

288

'automaton_data': pickle.dumps(self.automaton)

289

}

290

with open(path, 'wb') as f:

291

pickle.dump(data, f)

292

293

@classmethod

294

def load(cls, path):

295

"""Load with version checking."""

296

with open(path, 'rb') as f:

297

data = pickle.load(f)

298

299

if data['version'] != cls.VERSION:

300

print(f"Warning: Version mismatch. Expected {cls.VERSION}, got {data['version']}")

301

302

automaton = pickle.loads(data['automaton_data'])

303

return cls(automaton)

304

305

# Usage

306

automaton = ahocorasick.Automaton()

307

# ... populate automaton ...

308

versioned = VersionedAutomaton(automaton)

309

versioned.save('versioned_automaton.dat')

310

311

loaded_versioned = VersionedAutomaton.load('versioned_automaton.dat')

312

```

313

314

## Error Handling

315

316

Common serialization errors and solutions:

317

318

### File Access Errors

319

320

```python

321

import ahocorasick

322

import os

323

324

def safe_save(automaton, path, serializer=None):

325

"""Save with error handling."""

326

try:

327

# Ensure directory exists

328

os.makedirs(os.path.dirname(path), exist_ok=True)

329

automaton.save(path, serializer)

330

return True

331

except PermissionError:

332

print(f"Permission denied: {path}")

333

return False

334

except IOError as e:

335

print(f"IO error: {e}")

336

return False

337

338

def safe_load(path, deserializer=None):

339

"""Load with error handling."""

340

try:

341

if not os.path.exists(path):

342

print(f"File not found: {path}")

343

return None

344

return ahocorasick.load(path, deserializer)

345

except IOError as e:

346

print(f"IO error: {e}")

347

return None

348

except Exception as e:

349

print(f"Deserialization error: {e}")

350

return None

351

```

352

353

### Serializer Validation

354

355

```python

356

def validate_serializer(serializer, deserializer, test_obj):

357

"""Validate that serializer/deserializer pair works."""

358

try:

359

serialized = serializer(test_obj)

360

deserialized = deserializer(serialized)

361

return deserialized == test_obj

362

except Exception as e:

363

print(f"Serializer validation failed: {e}")

364

return False

365

366

# Usage

367

test_data = {'test': 'data', 'number': 42}

368

if validate_serializer(pickle.dumps, pickle.loads, test_data):

369

print("Serializer pair is valid")

370

```

371

372

## Performance Considerations

373

374

### File Size Optimization

375

376

- **STORE_INTS**: Smallest file size, fastest save/load

377

- **STORE_LENGTH**: Very small file size, fast operations

378

- **STORE_ANY**: Size depends on serializer efficiency

379

380

### Memory Usage

381

382

- Save operations require temporary memory for serialization

383

- Load operations create new automaton instance

384

- Consider available memory when working with large automatons

385

386

### Best Practices

387

388

1. **Use appropriate storage type** for your data

389

2. **Test serialization round-trip** before deployment

390

3. **Handle errors gracefully** in production code

391

4. **Consider compression** for large automatons stored long-term

392

5. **Version your data format** for long-term compatibility