0
# MARISA Trie
1
2
Static memory-efficient and fast Trie-like structures for Python based on MARISA-trie C++ library. String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict while maintaining comparable lookup speed. The library provides fast advanced methods like prefix search, making it ideal for applications requiring efficient string storage, autocomplete functionality, and text processing.
3
4
## Package Information
5
6
- **Package Name**: marisa-trie
7
- **Language**: Python
8
- **Installation**: `pip install marisa-trie`
9
- **Python Support**: 3.9+
10
11
## Core Imports
12
13
```python
14
import marisa_trie
15
```
16
17
Common imports for specific trie types:
18
19
```python
20
from marisa_trie import Trie, BinaryTrie, BytesTrie, RecordTrie
21
```
22
23
## Basic Usage
24
25
```python
26
import marisa_trie
27
28
# Create a Trie with unicode strings -> auto-generated IDs
29
trie = marisa_trie.Trie(['apple', 'application', 'apply', 'banana'])
30
31
# Check if keys exist
32
print('apple' in trie) # True
33
print('app' in trie) # False
34
35
# Get unique ID for a key
36
apple_id = trie.key_id('apple')
37
print(f"ID for 'apple': {apple_id}")
38
39
# Restore key from ID
40
restored_key = trie.restore_key(apple_id)
41
print(f"Key for ID {apple_id}: {restored_key}")
42
43
# Find all keys with prefix
44
prefix_keys = trie.keys(prefix='app')
45
print(f"Keys starting with 'app': {prefix_keys}")
46
47
# Find all prefixes of a key
48
prefixes = trie.prefixes('application')
49
print(f"Prefixes of 'application': {prefixes}")
50
51
# Save and load trie
52
trie.save('my_trie.dat')
53
new_trie = marisa_trie.Trie()
54
new_trie.load('my_trie.dat')
55
```
56
57
## Architecture
58
59
MARISA-trie implements static, memory-efficient Trie data structures optimized for read-heavy workloads:
60
61
- **Static Structure**: Once built, tries are immutable and optimized for space efficiency
62
- **Memory Efficiency**: Uses sophisticated compression techniques to reduce memory usage by 50x-100x compared to Python dicts
63
- **Performance**: Comparable lookup speeds to Python dicts despite memory optimization
64
- **C++ Backend**: Built on the high-performance MARISA-trie C++ library with Cython bindings
65
66
The library provides four specialized trie classes for different use cases:
67
- **Trie**: Unicode strings to auto-generated IDs
68
- **BinaryTrie**: Bytes keys to auto-generated IDs
69
- **BytesTrie**: Unicode keys to lists of bytes payloads
70
- **RecordTrie**: Unicode keys to lists of structured data tuples
71
72
## Capabilities
73
74
### Core Trie Classes
75
76
Primary trie implementations for mapping keys to auto-generated unique IDs, supporting both unicode strings and binary data with comprehensive lookup and prefix search capabilities.
77
78
```python { .api }
79
class Trie:
80
def __init__(self, arg=None, num_tries=DEFAULT_NUM_TRIES, binary=False,
81
cache_size=DEFAULT_CACHE, order=DEFAULT_ORDER, weights=None): ...
82
def key_id(self, key: str) -> int: ...
83
def restore_key(self, index: int) -> str: ...
84
def keys(self, prefix=None) -> list: ...
85
def prefixes(self, key: str) -> list: ...
86
87
class BinaryTrie:
88
def __init__(self, arg=None, **options): ...
89
def key_id(self, key: bytes) -> int: ...
90
def restore_key(self, index: int) -> bytes: ...
91
def prefixes(self, key: bytes) -> list: ...
92
```
93
94
[Core Trie Classes](./trie-classes.md)
95
96
### Specialized Payload Tries
97
98
Advanced trie implementations that map keys to lists of custom data payloads, supporting bytes objects and structured data with automatic serialization.
99
100
```python { .api }
101
class BytesTrie:
102
def __init__(self, arg=None, value_separator=b'\xff', **options): ...
103
def get(self, key, default=None) -> list: ...
104
def get_value(self, key: str) -> list: ...
105
def items(self, prefix="") -> list: ...
106
107
class RecordTrie:
108
def __init__(self, fmt: str, arg=None, **options): ...
109
def get(self, key, default=None) -> list: ...
110
def items(self, prefix="") -> list: ...
111
```
112
113
[Specialized Payload Tries](./bytes-record-tries.md)
114
115
### Configuration and Constants
116
117
Configuration constants for optimizing trie performance, memory usage, and behavior including cache sizes, node ordering, and storage options.
118
119
```python { .api }
120
# Cache size constants
121
DEFAULT_CACHE: int
122
HUGE_CACHE: int
123
LARGE_CACHE: int
124
NORMAL_CACHE: int
125
SMALL_CACHE: int
126
TINY_CACHE: int
127
128
# Node ordering constants
129
LABEL_ORDER: int
130
WEIGHT_ORDER: int
131
DEFAULT_ORDER: int
132
```
133
134
[Configuration and Constants](./configuration.md)