or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bytes-record-tries.mdconfiguration.mdindex.mdtrie-classes.md
tile.json

tessl/pypi-marisa-trie

Static memory-efficient and fast Trie-like structures for Python based on MARISA-trie C++ library.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/marisa-trie@1.3.x

To install, run

npx @tessl/cli install tessl/pypi-marisa-trie@1.3.0

index.mddocs/

MARISA Trie

Static memory-efficient and fast Trie-like structures for Python based on MARISA-trie C++ library. String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict while maintaining comparable lookup speed. The library provides fast advanced methods like prefix search, making it ideal for applications requiring efficient string storage, autocomplete functionality, and text processing.

Package Information

  • Package Name: marisa-trie
  • Language: Python
  • Installation: pip install marisa-trie
  • Python Support: 3.9+

Core Imports

import marisa_trie

Common imports for specific trie types:

from marisa_trie import Trie, BinaryTrie, BytesTrie, RecordTrie

Basic Usage

import marisa_trie

# Create a Trie with unicode strings -> auto-generated IDs
trie = marisa_trie.Trie(['apple', 'application', 'apply', 'banana'])

# Check if keys exist
print('apple' in trie)  # True
print('app' in trie)    # False

# Get unique ID for a key
apple_id = trie.key_id('apple')
print(f"ID for 'apple': {apple_id}")

# Restore key from ID
restored_key = trie.restore_key(apple_id)
print(f"Key for ID {apple_id}: {restored_key}")

# Find all keys with prefix
prefix_keys = trie.keys(prefix='app')
print(f"Keys starting with 'app': {prefix_keys}")

# Find all prefixes of a key
prefixes = trie.prefixes('application')
print(f"Prefixes of 'application': {prefixes}")

# Save and load trie
trie.save('my_trie.dat')
new_trie = marisa_trie.Trie()
new_trie.load('my_trie.dat')

Architecture

MARISA-trie implements static, memory-efficient Trie data structures optimized for read-heavy workloads:

  • Static Structure: Once built, tries are immutable and optimized for space efficiency
  • Memory Efficiency: Uses sophisticated compression techniques to reduce memory usage by 50x-100x compared to Python dicts
  • Performance: Comparable lookup speeds to Python dicts despite memory optimization
  • C++ Backend: Built on the high-performance MARISA-trie C++ library with Cython bindings

The library provides four specialized trie classes for different use cases:

  • Trie: Unicode strings to auto-generated IDs
  • BinaryTrie: Bytes keys to auto-generated IDs
  • BytesTrie: Unicode keys to lists of bytes payloads
  • RecordTrie: Unicode keys to lists of structured data tuples

Capabilities

Core Trie Classes

Primary trie implementations for mapping keys to auto-generated unique IDs, supporting both unicode strings and binary data with comprehensive lookup and prefix search capabilities.

class Trie:
    def __init__(self, arg=None, num_tries=DEFAULT_NUM_TRIES, binary=False, 
                 cache_size=DEFAULT_CACHE, order=DEFAULT_ORDER, weights=None): ...
    def key_id(self, key: str) -> int: ...
    def restore_key(self, index: int) -> str: ...
    def keys(self, prefix=None) -> list: ...
    def prefixes(self, key: str) -> list: ...

class BinaryTrie:
    def __init__(self, arg=None, **options): ...
    def key_id(self, key: bytes) -> int: ...
    def restore_key(self, index: int) -> bytes: ...
    def prefixes(self, key: bytes) -> list: ...

Core Trie Classes

Specialized Payload Tries

Advanced trie implementations that map keys to lists of custom data payloads, supporting bytes objects and structured data with automatic serialization.

class BytesTrie:
    def __init__(self, arg=None, value_separator=b'\xff', **options): ...
    def get(self, key, default=None) -> list: ...
    def get_value(self, key: str) -> list: ...
    def items(self, prefix="") -> list: ...

class RecordTrie:
    def __init__(self, fmt: str, arg=None, **options): ...
    def get(self, key, default=None) -> list: ...
    def items(self, prefix="") -> list: ...

Specialized Payload Tries

Configuration and Constants

Configuration constants for optimizing trie performance, memory usage, and behavior including cache sizes, node ordering, and storage options.

# Cache size constants
DEFAULT_CACHE: int
HUGE_CACHE: int
LARGE_CACHE: int
NORMAL_CACHE: int
SMALL_CACHE: int
TINY_CACHE: int

# Node ordering constants  
LABEL_ORDER: int
WEIGHT_ORDER: int
DEFAULT_ORDER: int

Configuration and Constants