0
# Serialization
1
2
Save and load automaton instances to/from disk with support for custom serialization functions for arbitrary object storage and efficient built-in serialization for integer storage.
3
4
## Capabilities
5
6
### Save Automaton
7
8
Save an automaton to disk for later reuse.
9
10
```python { .api }
11
def save(self, path, serializer=None):
12
"""
13
Save content of automaton to a file on disk.
14
15
Parameters:
16
- path: File path to save to
17
- serializer: Callable for converting Python objects to bytes
18
(required for STORE_ANY, not used for STORE_INTS/STORE_LENGTH)
19
20
Raises:
21
- ValueError: If serializer required but not provided
22
- IOError: If file cannot be written
23
"""
24
```
25
26
#### Usage Examples
27
28
```python
29
import ahocorasick
30
import pickle
31
32
# STORE_ANY - requires serializer
33
automaton = ahocorasick.Automaton(ahocorasick.STORE_ANY)
34
automaton.add_word('hello', {'type': 'greeting', 'lang': 'en'})
35
automaton.add_word('world', {'type': 'noun', 'meaning': 'earth'})
36
automaton.make_automaton()
37
38
# Save with pickle serializer
39
automaton.save('my_automaton.dat', pickle.dumps)
40
41
# STORE_INTS - no serializer needed
42
int_automaton = ahocorasick.Automaton(ahocorasick.STORE_INTS)
43
int_automaton.add_word('cat', 1)
44
int_automaton.add_word('dog', 2)
45
int_automaton.make_automaton()
46
47
# Save without serializer
48
int_automaton.save('int_automaton.dat')
49
50
# STORE_LENGTH - no serializer needed
51
length_automaton = ahocorasick.Automaton(ahocorasick.STORE_LENGTH)
52
length_automaton.add_word('apple') # value = 5
53
length_automaton.add_word('orange') # value = 6
54
length_automaton.make_automaton()
55
56
# Save without serializer
57
length_automaton.save('length_automaton.dat')
58
```
59
60
### Load Automaton
61
62
Load a previously saved automaton from disk.
63
64
```python { .api }
65
def ahocorasick.load(path, deserializer=None):
66
"""
67
Load automaton previously stored on disk using save method.
68
69
Parameters:
70
- path: File path to load from
71
- deserializer: Callable for converting bytes back to Python objects
72
(required for STORE_ANY automatons, not used for others)
73
74
Returns:
75
Automaton: Loaded automaton instance ready for use
76
77
Raises:
78
- ValueError: If deserializer required but not provided
79
- IOError: If file cannot be read
80
- PickleError: If deserialization fails
81
"""
82
```
83
84
#### Usage Examples
85
86
```python
87
import ahocorasick
88
import pickle
89
90
# Load STORE_ANY automaton - requires deserializer
91
loaded_automaton = ahocorasick.load('my_automaton.dat', pickle.loads)
92
93
# Verify it works
94
print(loaded_automaton.get('hello')) # {'type': 'greeting', 'lang': 'en'}
95
text = "hello world"
96
matches = list(loaded_automaton.iter(text))
97
print(matches)
98
99
# Load STORE_INTS automaton - no deserializer needed
100
int_automaton = ahocorasick.load('int_automaton.dat')
101
print(int_automaton.get('cat')) # 1
102
print(int_automaton.get('dog')) # 2
103
104
# Load STORE_LENGTH automaton - no deserializer needed
105
length_automaton = ahocorasick.load('length_automaton.dat')
106
print(length_automaton.get('apple')) # 5
107
print(length_automaton.get('orange')) # 6
108
```
109
110
### Pickle Support
111
112
Automatons support Python's standard pickle module for serialization.
113
114
```python { .api }
115
def __reduce__(self):
116
"""
117
Return pickle-able data for this automaton instance.
118
119
Returns:
120
tuple: Data needed to reconstruct the automaton
121
122
Usage:
123
This method enables standard pickle.dump() and pickle.load() operations.
124
"""
125
```
126
127
#### Usage Examples
128
129
```python
130
import ahocorasick
131
import pickle
132
133
# Create and populate automaton
134
automaton = ahocorasick.Automaton()
135
words = ['the', 'quick', 'brown', 'fox']
136
for i, word in enumerate(words):
137
automaton.add_word(word, i)
138
automaton.make_automaton()
139
140
# Pickle to bytes
141
pickled_data = pickle.dumps(automaton)
142
143
# Unpickle from bytes
144
restored_automaton = pickle.loads(pickled_data)
145
146
# Verify functionality
147
print(restored_automaton.get('quick')) # 1
148
matches = list(restored_automaton.iter('the quick brown fox'))
149
print(len(matches)) # 4
150
151
# Pickle to file
152
with open('automaton.pickle', 'wb') as f:
153
pickle.dump(automaton, f)
154
155
# Unpickle from file
156
with open('automaton.pickle', 'rb') as f:
157
file_automaton = pickle.load(f)
158
159
print(file_automaton.get('fox')) # 3
160
```
161
162
## Serialization Methods Comparison
163
164
### Custom save/load vs Pickle
165
166
| Feature | save/load | pickle |
167
|---------|-----------|--------|
168
| **Performance** | Faster for large automatons | Slower, more overhead |
169
| **File Size** | Smaller files | Larger files |
170
| **Portability** | pyahocorasick specific | Standard Python |
171
| **Flexibility** | Custom serializers | Full object graph |
172
| **Memory Usage** | Lower during operation | Higher during operation |
173
174
### Storage Type Considerations
175
176
| Storage Type | save/load Serializer | pickle Support | Notes |
177
|--------------|---------------------|----------------|-------|
178
| **STORE_INTS** | Not required | Yes | Most efficient |
179
| **STORE_LENGTH** | Not required | Yes | Very efficient |
180
| **STORE_ANY** | Required | Yes | Depends on object complexity |
181
182
## Advanced Serialization Patterns
183
184
### Custom Serialization for Complex Objects
185
186
```python
187
import ahocorasick
188
import json
189
import pickle
190
191
class CustomSerializer:
192
"""Custom serializer for complex objects."""
193
194
@staticmethod
195
def serialize(obj):
196
"""Convert object to bytes."""
197
if isinstance(obj, dict):
198
return json.dumps(obj).encode('utf-8')
199
else:
200
return pickle.dumps(obj)
201
202
@staticmethod
203
def deserialize(data):
204
"""Convert bytes back to object."""
205
try:
206
# Try JSON first
207
return json.loads(data.decode('utf-8'))
208
except (UnicodeDecodeError, json.JSONDecodeError):
209
# Fall back to pickle
210
return pickle.loads(data)
211
212
# Usage
213
automaton = ahocorasick.Automaton()
214
automaton.add_word('config', {'host': 'localhost', 'port': 8080})
215
automaton.add_word('data', [1, 2, 3, 4, 5])
216
automaton.make_automaton()
217
218
# Save with custom serializer
219
automaton.save('custom.dat', CustomSerializer.serialize)
220
221
# Load with custom deserializer
222
loaded = ahocorasick.load('custom.dat', CustomSerializer.deserialize)
223
print(loaded.get('config')) # {'host': 'localhost', 'port': 8080}
224
```
225
226
### Conditional Serialization
227
228
```python
229
def conditional_serializer(obj):
230
"""Serialize only certain types of objects."""
231
if isinstance(obj, (str, int, float, bool)):
232
return pickle.dumps(obj)
233
elif isinstance(obj, dict) and all(isinstance(k, str) for k in obj.keys()):
234
return json.dumps(obj).encode('utf-8')
235
else:
236
raise ValueError(f"Cannot serialize object of type {type(obj)}")
237
238
def conditional_deserializer(data):
239
"""Deserialize with type detection."""
240
try:
241
return json.loads(data.decode('utf-8'))
242
except:
243
return pickle.loads(data)
244
```
245
246
### Compression Support
247
248
```python
249
import gzip
250
import pickle
251
252
def compressed_save(automaton, path):
253
"""Save automaton with compression."""
254
with gzip.open(path, 'wb') as f:
255
pickle.dump(automaton, f)
256
257
def compressed_load(path):
258
"""Load compressed automaton."""
259
with gzip.open(path, 'rb') as f:
260
return pickle.load(f)
261
262
# Usage
263
automaton = ahocorasick.Automaton()
264
# ... populate automaton ...
265
compressed_save(automaton, 'compressed_automaton.pkl.gz')
266
loaded = compressed_load('compressed_automaton.pkl.gz')
267
```
268
269
### Version-aware Serialization
270
271
```python
272
import ahocorasick
273
import pickle
274
275
class VersionedAutomaton:
276
"""Wrapper that adds version information."""
277
278
VERSION = "1.0"
279
280
def __init__(self, automaton):
281
self.version = self.VERSION
282
self.automaton = automaton
283
284
def save(self, path):
285
"""Save with version info."""
286
data = {
287
'version': self.version,
288
'automaton_data': pickle.dumps(self.automaton)
289
}
290
with open(path, 'wb') as f:
291
pickle.dump(data, f)
292
293
@classmethod
294
def load(cls, path):
295
"""Load with version checking."""
296
with open(path, 'rb') as f:
297
data = pickle.load(f)
298
299
if data['version'] != cls.VERSION:
300
print(f"Warning: Version mismatch. Expected {cls.VERSION}, got {data['version']}")
301
302
automaton = pickle.loads(data['automaton_data'])
303
return cls(automaton)
304
305
# Usage
306
automaton = ahocorasick.Automaton()
307
# ... populate automaton ...
308
versioned = VersionedAutomaton(automaton)
309
versioned.save('versioned_automaton.dat')
310
311
loaded_versioned = VersionedAutomaton.load('versioned_automaton.dat')
312
```
313
314
## Error Handling
315
316
Common serialization errors and solutions:
317
318
### File Access Errors
319
320
```python
321
import ahocorasick
322
import os
323
324
def safe_save(automaton, path, serializer=None):
325
"""Save with error handling."""
326
try:
327
# Ensure directory exists
328
os.makedirs(os.path.dirname(path), exist_ok=True)
329
automaton.save(path, serializer)
330
return True
331
except PermissionError:
332
print(f"Permission denied: {path}")
333
return False
334
except IOError as e:
335
print(f"IO error: {e}")
336
return False
337
338
def safe_load(path, deserializer=None):
339
"""Load with error handling."""
340
try:
341
if not os.path.exists(path):
342
print(f"File not found: {path}")
343
return None
344
return ahocorasick.load(path, deserializer)
345
except IOError as e:
346
print(f"IO error: {e}")
347
return None
348
except Exception as e:
349
print(f"Deserialization error: {e}")
350
return None
351
```
352
353
### Serializer Validation
354
355
```python
356
def validate_serializer(serializer, deserializer, test_obj):
357
"""Validate that serializer/deserializer pair works."""
358
try:
359
serialized = serializer(test_obj)
360
deserialized = deserializer(serialized)
361
return deserialized == test_obj
362
except Exception as e:
363
print(f"Serializer validation failed: {e}")
364
return False
365
366
# Usage
367
test_data = {'test': 'data', 'number': 42}
368
if validate_serializer(pickle.dumps, pickle.loads, test_data):
369
print("Serializer pair is valid")
370
```
371
372
## Performance Considerations
373
374
### File Size Optimization
375
376
- **STORE_INTS**: Smallest file size, fastest save/load
377
- **STORE_LENGTH**: Very small file size, fast operations
378
- **STORE_ANY**: Size depends on serializer efficiency
379
380
### Memory Usage
381
382
- Save operations require temporary memory for serialization
383
- Load operations create new automaton instance
384
- Consider available memory when working with large automatons
385
386
### Best Practices
387
388
1. **Use appropriate storage type** for your data
389
2. **Test serialization round-trip** before deployment
390
3. **Handle errors gracefully** in production code
391
4. **Consider compression** for large automatons stored long-term
392
5. **Version your data format** for long-term compatibility