0
# Core Trie Classes
1
2
Primary trie implementations for mapping keys to auto-generated unique IDs. The Trie class handles unicode strings while BinaryTrie works with binary data, both providing efficient key-to-ID mapping with comprehensive lookup and prefix search capabilities.
3
4
## Capabilities
5
6
### Unicode String Trie
7
8
Maps unicode string keys to auto-generated unique integer IDs with support for prefix operations, key restoration, iteration, and serialization.
9
10
```python { .api }
11
class Trie:
12
def __init__(self, arg=None, num_tries=DEFAULT_NUM_TRIES, binary=False,
13
cache_size=DEFAULT_CACHE, order=DEFAULT_ORDER, weights=None):
14
"""
15
Create a trie from unicode string keys.
16
17
Args:
18
arg (iterable, optional): Iterable of unicode string keys
19
num_tries (int): Number of tries to use (MIN_NUM_TRIES to MAX_NUM_TRIES)
20
binary (bool): Use binary tail storage instead of text
21
cache_size (int): Cache size constant for performance tuning
22
order (int): Node ordering (LABEL_ORDER or WEIGHT_ORDER)
23
weights (iterable, optional): Lookup frequency weights for optimization
24
"""
25
26
def key_id(self, key: str) -> int:
27
"""
28
Return unique ID for a unicode key.
29
30
Args:
31
key (str): Unicode key to look up
32
33
Returns:
34
int: Unique integer ID for the key
35
36
Raises:
37
KeyError: If key is not present in trie
38
"""
39
40
def restore_key(self, index: int) -> str:
41
"""
42
Return unicode key corresponding to given ID.
43
44
Args:
45
index (int): Key ID to look up
46
47
Returns:
48
str: Unicode key for the ID
49
50
Raises:
51
KeyError: If ID is not valid
52
"""
53
54
def get(self, key, default=None):
55
"""
56
Return ID for key or default if not found.
57
58
Args:
59
key: Key to look up
60
default: Value to return if key not found
61
62
Returns:
63
int or default: Key ID or default value
64
"""
65
66
def keys(self, prefix=None) -> list:
67
"""
68
Return list of keys with optional prefix filter.
69
70
Args:
71
prefix (str, optional): Prefix to filter keys
72
73
Returns:
74
list: List of unicode keys
75
"""
76
77
def iterkeys(self, prefix=None):
78
"""
79
Return iterator over keys with optional prefix filter.
80
81
Args:
82
prefix (str, optional): Prefix to filter keys
83
84
Yields:
85
str: Unicode keys
86
"""
87
88
def prefixes(self, key: str) -> list:
89
"""
90
Return list of all prefixes of given key that exist in trie.
91
92
Args:
93
key (str): Key to find prefixes for
94
95
Returns:
96
list: List of prefix strings
97
"""
98
99
def iter_prefixes(self, key: str):
100
"""
101
Return iterator over all prefixes of given key.
102
103
Args:
104
key (str): Key to find prefixes for
105
106
Yields:
107
str: Prefix strings
108
"""
109
110
def iter_prefixes_with_ids(self, key: str):
111
"""
112
Return iterator of (prefix, id) pairs for all prefixes.
113
114
Args:
115
key (str): Key to find prefixes for
116
117
Yields:
118
tuple: (prefix_string, prefix_id) pairs
119
"""
120
121
def items(self, prefix="") -> list:
122
"""
123
Return list of (key, id) pairs with optional prefix.
124
125
Args:
126
prefix (str): Prefix to filter items
127
128
Returns:
129
list: List of (key, id) tuples
130
"""
131
132
def iteritems(self, prefix=""):
133
"""
134
Return iterator over (key, id) pairs with optional prefix.
135
136
Args:
137
prefix (str): Prefix to filter items
138
139
Yields:
140
tuple: (key, id) pairs
141
"""
142
```
143
144
### Binary Data Trie
145
146
Maps bytes keys to auto-generated unique integer IDs with support for binary data and prefix operations.
147
148
```python { .api }
149
class BinaryTrie:
150
def __init__(self, arg=None, **options):
151
"""
152
Create a trie from bytes keys.
153
154
Args:
155
arg (iterable, optional): Iterable of bytes keys
156
**options: Same options as Trie class
157
"""
158
159
def key_id(self, key: bytes) -> int:
160
"""
161
Return unique ID for a bytes key.
162
163
Args:
164
key (bytes): Bytes key to look up
165
166
Returns:
167
int: Unique integer ID for the key
168
169
Raises:
170
KeyError: If key is not present in trie
171
"""
172
173
def restore_key(self, index: int) -> bytes:
174
"""
175
Return bytes key corresponding to given ID.
176
177
Args:
178
index (int): Key ID to look up
179
180
Returns:
181
bytes: Bytes key for the ID
182
183
Raises:
184
KeyError: If ID is not valid
185
"""
186
187
def get(self, key: bytes, default=None):
188
"""
189
Return ID for bytes key or default if not found.
190
191
Args:
192
key (bytes): Bytes key to look up
193
default: Value to return if key not found
194
195
Returns:
196
int or default: Key ID or default value
197
"""
198
199
def iter_prefixes(self, key: bytes):
200
"""
201
Return iterator over all prefixes of given bytes key.
202
203
Args:
204
key (bytes): Bytes key to find prefixes for
205
206
Yields:
207
bytes: Prefix bytes objects
208
"""
209
210
def prefixes(self, key: bytes) -> list:
211
"""
212
Return list of all prefixes of given bytes key.
213
214
Args:
215
key (bytes): Bytes key to find prefixes for
216
217
Returns:
218
list: List of prefix bytes objects
219
"""
220
221
def items(self, prefix=b"") -> list:
222
"""
223
Return list of (key, id) pairs with optional bytes prefix.
224
225
Args:
226
prefix (bytes): Bytes prefix to filter items
227
228
Returns:
229
list: List of (bytes_key, id) tuples
230
"""
231
232
def iteritems(self, prefix=b""):
233
"""
234
Return iterator over (key, id) pairs with optional bytes prefix.
235
236
Args:
237
prefix (bytes): Bytes prefix to filter items
238
239
Yields:
240
tuple: (bytes_key, id) pairs
241
"""
242
243
def keys(self, prefix=b"") -> list:
244
"""
245
Return list of bytes keys with optional prefix.
246
247
Args:
248
prefix (bytes): Bytes prefix to filter keys
249
250
Returns:
251
list: List of bytes keys
252
"""
253
254
def iterkeys(self, prefix=b""):
255
"""
256
Return iterator over bytes keys with optional prefix.
257
258
Args:
259
prefix (bytes): Bytes prefix to filter keys
260
261
Yields:
262
bytes: Bytes keys
263
"""
264
```
265
266
### Common Operations
267
268
Both Trie and BinaryTrie classes support these common operations:
269
270
```python { .api }
271
# Container operations
272
def __contains__(self, key) -> bool:
273
"""Check if key exists in trie."""
274
275
def __len__(self) -> int:
276
"""Return number of keys in trie."""
277
278
def __iter__(self):
279
"""Iterate over all keys."""
280
281
def __getitem__(self, key) -> int:
282
"""Get key ID using dictionary syntax."""
283
284
# Serialization operations
285
def save(self, path: str):
286
"""Save trie to file path."""
287
288
def load(self, path: str):
289
"""Load trie from file path."""
290
291
def tobytes(self) -> bytes:
292
"""Return raw trie content as bytes."""
293
294
def frombytes(self, data: bytes):
295
"""Load trie from raw bytes."""
296
297
def mmap(self, path: str):
298
"""Memory map trie file for efficient access."""
299
300
def map(self, buffer):
301
"""Load trie from buffer protocol object."""
302
```
303
304
## Usage Examples
305
306
### Basic Trie Operations
307
308
```python
309
import marisa_trie
310
311
# Create trie with string keys
312
words = ['apple', 'application', 'apply', 'banana', 'band', 'bandana']
313
trie = marisa_trie.Trie(words)
314
315
# Key lookup operations
316
print(trie.key_id('apple')) # Get unique ID
317
print(trie.restore_key(0)) # Get key from ID
318
print('app' in trie) # False - exact match only
319
print(len(trie)) # 6
320
321
# Dictionary-style access
322
apple_id = trie['apple']
323
print(f"Apple ID: {apple_id}")
324
```
325
326
### Prefix Operations
327
328
```python
329
# Find keys with prefix
330
app_words = trie.keys(prefix='app')
331
print(f"Words starting with 'app': {app_words}")
332
# Output: ['apple', 'application', 'apply']
333
334
# Find prefixes of a key
335
prefixes = trie.prefixes('application')
336
print(f"Prefixes of 'application': {prefixes}")
337
# Output: ['app', 'appli', 'application'] (only existing keys)
338
339
# Iterate with IDs
340
for word, word_id in trie.iteritems(prefix='ban'):
341
print(f"{word}: {word_id}")
342
```
343
344
### Performance Optimization
345
346
```python
347
# Use weights for frequently accessed keys
348
keys = ['common', 'rare', 'frequent']
349
weights = [100, 1, 50] # Higher weights for common lookups
350
351
optimized_trie = marisa_trie.Trie(
352
keys,
353
weights=weights,
354
cache_size=marisa_trie.LARGE_CACHE,
355
order=marisa_trie.WEIGHT_ORDER
356
)
357
```
358
359
### Binary Data Handling
360
361
```python
362
# Create trie with binary keys
363
binary_keys = [b'header\x00\x01', b'data\x02\x03', b'footer\xff']
364
binary_trie = marisa_trie.BinaryTrie(binary_keys)
365
366
# Binary operations
367
data_id = binary_trie.key_id(b'data\x02\x03')
368
restored = binary_trie.restore_key(data_id)
369
print(f"Restored binary key: {restored}")
370
371
# Prefix search with binary data
372
prefixes = binary_trie.prefixes(b'header\x00\x01\x02')
373
print(f"Binary prefixes: {prefixes}")
374
```
375
376
### Serialization and Persistence
377
378
```python
379
# Save trie to file
380
trie.save('dictionary.trie')
381
382
# Load trie from file
383
loaded_trie = marisa_trie.Trie()
384
loaded_trie.load('dictionary.trie')
385
386
# Memory-efficient loading via memory mapping
387
mapped_trie = marisa_trie.Trie()
388
mapped_trie.mmap('dictionary.trie')
389
390
# Serialize to bytes for network transfer
391
trie_bytes = trie.tobytes()
392
new_trie = marisa_trie.Trie()
393
new_trie.frombytes(trie_bytes)
394
```
395
396
## Additional Methods and Special Operations
397
398
### Deprecated Methods
399
400
These methods are still available but deprecated and will be removed in future versions:
401
402
```python { .api }
403
# Deprecated serialization methods (use save/load instead)
404
def read(self, f):
405
"""
406
Read trie from an open file descriptor.
407
408
Args:
409
f: Open file object with file descriptor
410
411
.. deprecated:: 0.7.3
412
Use load() instead. Will be removed in 0.8.0.
413
"""
414
415
def write(self, f):
416
"""
417
Write trie to an open file descriptor.
418
419
Args:
420
f: Open file object with file descriptor
421
422
.. deprecated:: 0.7.3
423
Use save() instead. Will be removed in 0.8.0.
424
"""
425
426
def has_keys_with_prefix(self, prefix="") -> bool:
427
"""
428
Check if any keys begin with given prefix.
429
430
Args:
431
prefix (str): Prefix to check
432
433
Returns:
434
bool: True if any keys start with prefix
435
436
.. deprecated:: 0.7.3
437
Use iterkeys() instead. Will be removed in 0.8.0.
438
"""
439
```
440
441
### Special Methods
442
443
Methods supporting Python's built-in operations and serialization protocols:
444
445
```python { .api }
446
# Comparison operations
447
def __richcmp__(self, other, int op) -> bool:
448
"""
449
Rich comparison supporting == and != operators.
450
451
Args:
452
other: Another Trie instance to compare
453
op: Comparison operation (2 for ==, 3 for !=)
454
455
Returns:
456
bool: True if tries are equal (for ==) or not equal (for !=)
457
458
Raises:
459
TypeError: For unsupported comparison operations (<, >, <=, >=)
460
"""
461
462
# Pickle serialization support
463
def __reduce__(self) -> tuple:
464
"""
465
Support for pickle serialization.
466
467
Returns:
468
tuple: (class, args, state) for pickle reconstruction
469
"""
470
471
def __setstate__(self, state: bytes):
472
"""
473
Support for pickle deserialization (alias for frombytes).
474
475
Args:
476
state (bytes): Serialized trie data from __reduce__
477
"""
478
```
479
480
### Architecture and Inheritance
481
482
The marisa-trie classes follow a hierarchical design:
483
484
```
485
_Trie (base class)
486
├── BinaryTrie (binary keys → IDs)
487
└── _UnicodeKeyedTrie (unicode key handling)
488
├── Trie (unicode keys → IDs)
489
└── BytesTrie (unicode keys → bytes payloads)
490
└── _UnpackTrie (value packing/unpacking)
491
└── RecordTrie (unicode keys → structured data)
492
```
493
494
**Base Class Methods**: The `_Trie` base class provides core functionality including:
495
- Container operations (`__contains__`, `__len__`, `__iter__`)
496
- Serialization (`save`, `load`, `tobytes`, `frombytes`, `mmap`, `map`)
497
- Key iteration (`keys`, `iterkeys`)
498
499
**Key-ID Methods**: Only `Trie` and `BinaryTrie` provide:
500
- `key_id()` and `restore_key()` for key-ID mapping
501
- Dictionary-style access with `__getitem__`
502
503
**Unicode-Specific Methods**: Only `Trie` provides:
504
- `iter_prefixes_with_ids()` for prefix-ID pairs
505
506
### Error Handling
507
508
All trie classes raise specific exceptions for error conditions:
509
510
```python
511
# KeyError for missing keys or invalid IDs
512
try:
513
key_id = trie.key_id('nonexistent')
514
except KeyError as e:
515
print(f"Key not found: {e}")
516
517
try:
518
key = trie.restore_key(999999)
519
except KeyError as e:
520
print(f"Invalid ID: {e}")
521
522
# ValueError for invalid configuration
523
try:
524
invalid_trie = marisa_trie.Trie(['test'], num_tries=999)
525
except ValueError as e:
526
print(f"Configuration error: {e}")
527
528
# File I/O exceptions during serialization
529
try:
530
trie.load('nonexistent.trie')
531
except FileNotFoundError as e:
532
print(f"File error: {e}")
533
```