or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bytes-record-tries.mdconfiguration.mdindex.mdtrie-classes.md

trie-classes.mddocs/

0

# Core Trie Classes

1

2

Primary trie implementations for mapping keys to auto-generated unique IDs. The Trie class handles unicode strings while BinaryTrie works with binary data, both providing efficient key-to-ID mapping with comprehensive lookup and prefix search capabilities.

3

4

## Capabilities

5

6

### Unicode String Trie

7

8

Maps unicode string keys to auto-generated unique integer IDs with support for prefix operations, key restoration, iteration, and serialization.

9

10

```python { .api }

11

class Trie:

12

def __init__(self, arg=None, num_tries=DEFAULT_NUM_TRIES, binary=False,

13

cache_size=DEFAULT_CACHE, order=DEFAULT_ORDER, weights=None):

14

"""

15

Create a trie from unicode string keys.

16

17

Args:

18

arg (iterable, optional): Iterable of unicode string keys

19

num_tries (int): Number of tries to use (MIN_NUM_TRIES to MAX_NUM_TRIES)

20

binary (bool): Use binary tail storage instead of text

21

cache_size (int): Cache size constant for performance tuning

22

order (int): Node ordering (LABEL_ORDER or WEIGHT_ORDER)

23

weights (iterable, optional): Lookup frequency weights for optimization

24

"""

25

26

def key_id(self, key: str) -> int:

27

"""

28

Return unique ID for a unicode key.

29

30

Args:

31

key (str): Unicode key to look up

32

33

Returns:

34

int: Unique integer ID for the key

35

36

Raises:

37

KeyError: If key is not present in trie

38

"""

39

40

def restore_key(self, index: int) -> str:

41

"""

42

Return unicode key corresponding to given ID.

43

44

Args:

45

index (int): Key ID to look up

46

47

Returns:

48

str: Unicode key for the ID

49

50

Raises:

51

KeyError: If ID is not valid

52

"""

53

54

def get(self, key, default=None):

55

"""

56

Return ID for key or default if not found.

57

58

Args:

59

key: Key to look up

60

default: Value to return if key not found

61

62

Returns:

63

int or default: Key ID or default value

64

"""

65

66

def keys(self, prefix=None) -> list:

67

"""

68

Return list of keys with optional prefix filter.

69

70

Args:

71

prefix (str, optional): Prefix to filter keys

72

73

Returns:

74

list: List of unicode keys

75

"""

76

77

def iterkeys(self, prefix=None):

78

"""

79

Return iterator over keys with optional prefix filter.

80

81

Args:

82

prefix (str, optional): Prefix to filter keys

83

84

Yields:

85

str: Unicode keys

86

"""

87

88

def prefixes(self, key: str) -> list:

89

"""

90

Return list of all prefixes of given key that exist in trie.

91

92

Args:

93

key (str): Key to find prefixes for

94

95

Returns:

96

list: List of prefix strings

97

"""

98

99

def iter_prefixes(self, key: str):

100

"""

101

Return iterator over all prefixes of given key.

102

103

Args:

104

key (str): Key to find prefixes for

105

106

Yields:

107

str: Prefix strings

108

"""

109

110

def iter_prefixes_with_ids(self, key: str):

111

"""

112

Return iterator of (prefix, id) pairs for all prefixes.

113

114

Args:

115

key (str): Key to find prefixes for

116

117

Yields:

118

tuple: (prefix_string, prefix_id) pairs

119

"""

120

121

def items(self, prefix="") -> list:

122

"""

123

Return list of (key, id) pairs with optional prefix.

124

125

Args:

126

prefix (str): Prefix to filter items

127

128

Returns:

129

list: List of (key, id) tuples

130

"""

131

132

def iteritems(self, prefix=""):

133

"""

134

Return iterator over (key, id) pairs with optional prefix.

135

136

Args:

137

prefix (str): Prefix to filter items

138

139

Yields:

140

tuple: (key, id) pairs

141

"""

142

```

143

144

### Binary Data Trie

145

146

Maps bytes keys to auto-generated unique integer IDs with support for binary data and prefix operations.

147

148

```python { .api }

149

class BinaryTrie:

150

def __init__(self, arg=None, **options):

151

"""

152

Create a trie from bytes keys.

153

154

Args:

155

arg (iterable, optional): Iterable of bytes keys

156

**options: Same options as Trie class

157

"""

158

159

def key_id(self, key: bytes) -> int:

160

"""

161

Return unique ID for a bytes key.

162

163

Args:

164

key (bytes): Bytes key to look up

165

166

Returns:

167

int: Unique integer ID for the key

168

169

Raises:

170

KeyError: If key is not present in trie

171

"""

172

173

def restore_key(self, index: int) -> bytes:

174

"""

175

Return bytes key corresponding to given ID.

176

177

Args:

178

index (int): Key ID to look up

179

180

Returns:

181

bytes: Bytes key for the ID

182

183

Raises:

184

KeyError: If ID is not valid

185

"""

186

187

def get(self, key: bytes, default=None):

188

"""

189

Return ID for bytes key or default if not found.

190

191

Args:

192

key (bytes): Bytes key to look up

193

default: Value to return if key not found

194

195

Returns:

196

int or default: Key ID or default value

197

"""

198

199

def iter_prefixes(self, key: bytes):

200

"""

201

Return iterator over all prefixes of given bytes key.

202

203

Args:

204

key (bytes): Bytes key to find prefixes for

205

206

Yields:

207

bytes: Prefix bytes objects

208

"""

209

210

def prefixes(self, key: bytes) -> list:

211

"""

212

Return list of all prefixes of given bytes key.

213

214

Args:

215

key (bytes): Bytes key to find prefixes for

216

217

Returns:

218

list: List of prefix bytes objects

219

"""

220

221

def items(self, prefix=b"") -> list:

222

"""

223

Return list of (key, id) pairs with optional bytes prefix.

224

225

Args:

226

prefix (bytes): Bytes prefix to filter items

227

228

Returns:

229

list: List of (bytes_key, id) tuples

230

"""

231

232

def iteritems(self, prefix=b""):

233

"""

234

Return iterator over (key, id) pairs with optional bytes prefix.

235

236

Args:

237

prefix (bytes): Bytes prefix to filter items

238

239

Yields:

240

tuple: (bytes_key, id) pairs

241

"""

242

243

def keys(self, prefix=b"") -> list:

244

"""

245

Return list of bytes keys with optional prefix.

246

247

Args:

248

prefix (bytes): Bytes prefix to filter keys

249

250

Returns:

251

list: List of bytes keys

252

"""

253

254

def iterkeys(self, prefix=b""):

255

"""

256

Return iterator over bytes keys with optional prefix.

257

258

Args:

259

prefix (bytes): Bytes prefix to filter keys

260

261

Yields:

262

bytes: Bytes keys

263

"""

264

```

265

266

### Common Operations

267

268

Both Trie and BinaryTrie classes support these common operations:

269

270

```python { .api }

271

# Container operations

272

def __contains__(self, key) -> bool:

273

"""Check if key exists in trie."""

274

275

def __len__(self) -> int:

276

"""Return number of keys in trie."""

277

278

def __iter__(self):

279

"""Iterate over all keys."""

280

281

def __getitem__(self, key) -> int:

282

"""Get key ID using dictionary syntax."""

283

284

# Serialization operations

285

def save(self, path: str):

286

"""Save trie to file path."""

287

288

def load(self, path: str):

289

"""Load trie from file path."""

290

291

def tobytes(self) -> bytes:

292

"""Return raw trie content as bytes."""

293

294

def frombytes(self, data: bytes):

295

"""Load trie from raw bytes."""

296

297

def mmap(self, path: str):

298

"""Memory map trie file for efficient access."""

299

300

def map(self, buffer):

301

"""Load trie from buffer protocol object."""

302

```

303

304

## Usage Examples

305

306

### Basic Trie Operations

307

308

```python

309

import marisa_trie

310

311

# Create trie with string keys

312

words = ['apple', 'application', 'apply', 'banana', 'band', 'bandana']

313

trie = marisa_trie.Trie(words)

314

315

# Key lookup operations

316

print(trie.key_id('apple')) # Get unique ID

317

print(trie.restore_key(0)) # Get key from ID

318

print('app' in trie) # False - exact match only

319

print(len(trie)) # 6

320

321

# Dictionary-style access

322

apple_id = trie['apple']

323

print(f"Apple ID: {apple_id}")

324

```

325

326

### Prefix Operations

327

328

```python

329

# Find keys with prefix

330

app_words = trie.keys(prefix='app')

331

print(f"Words starting with 'app': {app_words}")

332

# Output: ['apple', 'application', 'apply']

333

334

# Find prefixes of a key

335

prefixes = trie.prefixes('application')

336

print(f"Prefixes of 'application': {prefixes}")

337

# Output: ['app', 'appli', 'application'] (only existing keys)

338

339

# Iterate with IDs

340

for word, word_id in trie.iteritems(prefix='ban'):

341

print(f"{word}: {word_id}")

342

```

343

344

### Performance Optimization

345

346

```python

347

# Use weights for frequently accessed keys

348

keys = ['common', 'rare', 'frequent']

349

weights = [100, 1, 50] # Higher weights for common lookups

350

351

optimized_trie = marisa_trie.Trie(

352

keys,

353

weights=weights,

354

cache_size=marisa_trie.LARGE_CACHE,

355

order=marisa_trie.WEIGHT_ORDER

356

)

357

```

358

359

### Binary Data Handling

360

361

```python

362

# Create trie with binary keys

363

binary_keys = [b'header\x00\x01', b'data\x02\x03', b'footer\xff']

364

binary_trie = marisa_trie.BinaryTrie(binary_keys)

365

366

# Binary operations

367

data_id = binary_trie.key_id(b'data\x02\x03')

368

restored = binary_trie.restore_key(data_id)

369

print(f"Restored binary key: {restored}")

370

371

# Prefix search with binary data

372

prefixes = binary_trie.prefixes(b'header\x00\x01\x02')

373

print(f"Binary prefixes: {prefixes}")

374

```

375

376

### Serialization and Persistence

377

378

```python

379

# Save trie to file

380

trie.save('dictionary.trie')

381

382

# Load trie from file

383

loaded_trie = marisa_trie.Trie()

384

loaded_trie.load('dictionary.trie')

385

386

# Memory-efficient loading via memory mapping

387

mapped_trie = marisa_trie.Trie()

388

mapped_trie.mmap('dictionary.trie')

389

390

# Serialize to bytes for network transfer

391

trie_bytes = trie.tobytes()

392

new_trie = marisa_trie.Trie()

393

new_trie.frombytes(trie_bytes)

394

```

395

396

## Additional Methods and Special Operations

397

398

### Deprecated Methods

399

400

These methods are still available but deprecated and will be removed in future versions:

401

402

```python { .api }

403

# Deprecated serialization methods (use save/load instead)

404

def read(self, f):

405

"""

406

Read trie from an open file descriptor.

407

408

Args:

409

f: Open file object with file descriptor

410

411

.. deprecated:: 0.7.3

412

Use load() instead. Will be removed in 0.8.0.

413

"""

414

415

def write(self, f):

416

"""

417

Write trie to an open file descriptor.

418

419

Args:

420

f: Open file object with file descriptor

421

422

.. deprecated:: 0.7.3

423

Use save() instead. Will be removed in 0.8.0.

424

"""

425

426

def has_keys_with_prefix(self, prefix="") -> bool:

427

"""

428

Check if any keys begin with given prefix.

429

430

Args:

431

prefix (str): Prefix to check

432

433

Returns:

434

bool: True if any keys start with prefix

435

436

.. deprecated:: 0.7.3

437

Use iterkeys() instead. Will be removed in 0.8.0.

438

"""

439

```

440

441

### Special Methods

442

443

Methods supporting Python's built-in operations and serialization protocols:

444

445

```python { .api }

446

# Comparison operations

447

def __richcmp__(self, other, int op) -> bool:

448

"""

449

Rich comparison supporting == and != operators.

450

451

Args:

452

other: Another Trie instance to compare

453

op: Comparison operation (2 for ==, 3 for !=)

454

455

Returns:

456

bool: True if tries are equal (for ==) or not equal (for !=)

457

458

Raises:

459

TypeError: For unsupported comparison operations (<, >, <=, >=)

460

"""

461

462

# Pickle serialization support

463

def __reduce__(self) -> tuple:

464

"""

465

Support for pickle serialization.

466

467

Returns:

468

tuple: (class, args, state) for pickle reconstruction

469

"""

470

471

def __setstate__(self, state: bytes):

472

"""

473

Support for pickle deserialization (alias for frombytes).

474

475

Args:

476

state (bytes): Serialized trie data from __reduce__

477

"""

478

```

479

480

### Architecture and Inheritance

481

482

The marisa-trie classes follow a hierarchical design:

483

484

```

485

_Trie (base class)

486

├── BinaryTrie (binary keys → IDs)

487

└── _UnicodeKeyedTrie (unicode key handling)

488

├── Trie (unicode keys → IDs)

489

└── BytesTrie (unicode keys → bytes payloads)

490

└── _UnpackTrie (value packing/unpacking)

491

└── RecordTrie (unicode keys → structured data)

492

```

493

494

**Base Class Methods**: The `_Trie` base class provides core functionality including:

495

- Container operations (`__contains__`, `__len__`, `__iter__`)

496

- Serialization (`save`, `load`, `tobytes`, `frombytes`, `mmap`, `map`)

497

- Key iteration (`keys`, `iterkeys`)

498

499

**Key-ID Methods**: Only `Trie` and `BinaryTrie` provide:

500

- `key_id()` and `restore_key()` for key-ID mapping

501

- Dictionary-style access with `__getitem__`

502

503

**Unicode-Specific Methods**: Only `Trie` provides:

504

- `iter_prefixes_with_ids()` for prefix-ID pairs

505

506

### Error Handling

507

508

All trie classes raise specific exceptions for error conditions:

509

510

```python

511

# KeyError for missing keys or invalid IDs

512

try:

513

key_id = trie.key_id('nonexistent')

514

except KeyError as e:

515

print(f"Key not found: {e}")

516

517

try:

518

key = trie.restore_key(999999)

519

except KeyError as e:

520

print(f"Invalid ID: {e}")

521

522

# ValueError for invalid configuration

523

try:

524

invalid_trie = marisa_trie.Trie(['test'], num_tries=999)

525

except ValueError as e:

526

print(f"Configuration error: {e}")

527

528

# File I/O exceptions during serialization

529

try:

530

trie.load('nonexistent.trie')

531

except FileNotFoundError as e:

532

print(f"File error: {e}")

533

```