or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bytes-record-tries.mdconfiguration.mdindex.mdtrie-classes.md

index.mddocs/

0

# MARISA Trie

1

2

Static memory-efficient and fast Trie-like structures for Python based on MARISA-trie C++ library. String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict while maintaining comparable lookup speed. The library provides fast advanced methods like prefix search, making it ideal for applications requiring efficient string storage, autocomplete functionality, and text processing.

3

4

## Package Information

5

6

- **Package Name**: marisa-trie

7

- **Language**: Python

8

- **Installation**: `pip install marisa-trie`

9

- **Python Support**: 3.9+

10

11

## Core Imports

12

13

```python

14

import marisa_trie

15

```

16

17

Common imports for specific trie types:

18

19

```python

20

from marisa_trie import Trie, BinaryTrie, BytesTrie, RecordTrie

21

```

22

23

## Basic Usage

24

25

```python

26

import marisa_trie

27

28

# Create a Trie with unicode strings -> auto-generated IDs

29

trie = marisa_trie.Trie(['apple', 'application', 'apply', 'banana'])

30

31

# Check if keys exist

32

print('apple' in trie) # True

33

print('app' in trie) # False

34

35

# Get unique ID for a key

36

apple_id = trie.key_id('apple')

37

print(f"ID for 'apple': {apple_id}")

38

39

# Restore key from ID

40

restored_key = trie.restore_key(apple_id)

41

print(f"Key for ID {apple_id}: {restored_key}")

42

43

# Find all keys with prefix

44

prefix_keys = trie.keys(prefix='app')

45

print(f"Keys starting with 'app': {prefix_keys}")

46

47

# Find all prefixes of a key

48

prefixes = trie.prefixes('application')

49

print(f"Prefixes of 'application': {prefixes}")

50

51

# Save and load trie

52

trie.save('my_trie.dat')

53

new_trie = marisa_trie.Trie()

54

new_trie.load('my_trie.dat')

55

```

56

57

## Architecture

58

59

MARISA-trie implements static, memory-efficient Trie data structures optimized for read-heavy workloads:

60

61

- **Static Structure**: Once built, tries are immutable and optimized for space efficiency

62

- **Memory Efficiency**: Uses sophisticated compression techniques to reduce memory usage by 50x-100x compared to Python dicts

63

- **Performance**: Comparable lookup speeds to Python dicts despite memory optimization

64

- **C++ Backend**: Built on the high-performance MARISA-trie C++ library with Cython bindings

65

66

The library provides four specialized trie classes for different use cases:

67

- **Trie**: Unicode strings to auto-generated IDs

68

- **BinaryTrie**: Bytes keys to auto-generated IDs

69

- **BytesTrie**: Unicode keys to lists of bytes payloads

70

- **RecordTrie**: Unicode keys to lists of structured data tuples

71

72

## Capabilities

73

74

### Core Trie Classes

75

76

Primary trie implementations for mapping keys to auto-generated unique IDs, supporting both unicode strings and binary data with comprehensive lookup and prefix search capabilities.

77

78

```python { .api }

79

class Trie:

80

def __init__(self, arg=None, num_tries=DEFAULT_NUM_TRIES, binary=False,

81

cache_size=DEFAULT_CACHE, order=DEFAULT_ORDER, weights=None): ...

82

def key_id(self, key: str) -> int: ...

83

def restore_key(self, index: int) -> str: ...

84

def keys(self, prefix=None) -> list: ...

85

def prefixes(self, key: str) -> list: ...

86

87

class BinaryTrie:

88

def __init__(self, arg=None, **options): ...

89

def key_id(self, key: bytes) -> int: ...

90

def restore_key(self, index: int) -> bytes: ...

91

def prefixes(self, key: bytes) -> list: ...

92

```

93

94

[Core Trie Classes](./trie-classes.md)

95

96

### Specialized Payload Tries

97

98

Advanced trie implementations that map keys to lists of custom data payloads, supporting bytes objects and structured data with automatic serialization.

99

100

```python { .api }

101

class BytesTrie:

102

def __init__(self, arg=None, value_separator=b'\xff', **options): ...

103

def get(self, key, default=None) -> list: ...

104

def get_value(self, key: str) -> list: ...

105

def items(self, prefix="") -> list: ...

106

107

class RecordTrie:

108

def __init__(self, fmt: str, arg=None, **options): ...

109

def get(self, key, default=None) -> list: ...

110

def items(self, prefix="") -> list: ...

111

```

112

113

[Specialized Payload Tries](./bytes-record-tries.md)

114

115

### Configuration and Constants

116

117

Configuration constants for optimizing trie performance, memory usage, and behavior including cache sizes, node ordering, and storage options.

118

119

```python { .api }

120

# Cache size constants

121

DEFAULT_CACHE: int

122

HUGE_CACHE: int

123

LARGE_CACHE: int

124

NORMAL_CACHE: int

125

SMALL_CACHE: int

126

TINY_CACHE: int

127

128

# Node ordering constants

129

LABEL_ORDER: int

130

WEIGHT_ORDER: int

131

DEFAULT_ORDER: int

132

```

133

134

[Configuration and Constants](./configuration.md)