or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

automaton-construction.mddictionary-interface.mdindex.mdpattern-search.mdserialization.md

automaton-construction.mddocs/

0

# Automaton Construction

1

2

Core functionality for creating and managing Aho-Corasick automata, including adding and removing patterns, configuring storage types, and converting tries to search-ready automatons.

3

4

## Capabilities

5

6

### Constructor

7

8

Create a new empty Automaton with configurable storage and key types.

9

10

```python { .api }

11

class Automaton:

12

def __init__(self, store=ahocorasick.STORE_ANY, key_type=ahocorasick.KEY_STRING):

13

"""

14

Create a new empty Automaton.

15

16

Parameters:

17

- store: Storage type for values (STORE_ANY, STORE_INTS, STORE_LENGTH)

18

- key_type: Type of keys (KEY_STRING, KEY_SEQUENCE)

19

"""

20

```

21

22

#### Usage Example

23

24

```python

25

import ahocorasick

26

27

# Default: store any objects with string keys

28

automaton = ahocorasick.Automaton()

29

30

# Store integers only

31

int_automaton = ahocorasick.Automaton(ahocorasick.STORE_INTS)

32

33

# Store key lengths automatically

34

length_automaton = ahocorasick.Automaton(ahocorasick.STORE_LENGTH)

35

36

# Use integer sequences as keys

37

seq_automaton = ahocorasick.Automaton(key_type=ahocorasick.KEY_SEQUENCE)

38

```

39

40

### Add Words

41

42

Add key strings to the trie with associated values.

43

44

```python { .api }

45

def add_word(self, key, value=None):

46

"""

47

Add a key string to the trie and associate with a value.

48

49

Parameters:

50

- key: String key to add (or integer sequence if KEY_SEQUENCE)

51

- value: Associated value (required for STORE_ANY, optional for STORE_INTS,

52

not allowed for STORE_LENGTH)

53

54

Returns:

55

bool: True if new word added, False if word already exists

56

57

Raises:

58

- ValueError: If value type doesn't match store type

59

- TypeError: If key type is incorrect

60

"""

61

```

62

63

#### Usage Examples

64

65

```python

66

# STORE_ANY (default) - value required

67

automaton = ahocorasick.Automaton()

68

automaton.add_word('hello', 'greeting')

69

automaton.add_word('world', {'type': 'noun', 'meaning': 'earth'})

70

71

# STORE_INTS - value optional (defaults to insertion order)

72

int_automaton = ahocorasick.Automaton(ahocorasick.STORE_INTS)

73

int_automaton.add_word('cat', 1)

74

int_automaton.add_word('dog') # defaults to 1 (second word added)

75

76

# STORE_LENGTH - value not allowed (automatically uses key length)

77

length_automaton = ahocorasick.Automaton(ahocorasick.STORE_LENGTH)

78

length_automaton.add_word('cat') # value will be 3

79

length_automaton.add_word('mouse') # value will be 5

80

81

# KEY_SEQUENCE - use integer sequences as keys

82

seq_automaton = ahocorasick.Automaton(key_type=ahocorasick.KEY_SEQUENCE)

83

seq_automaton.add_word([1, 2, 3], 'sequence1')

84

seq_automaton.add_word((4, 5, 6), 'sequence2')

85

```

86

87

### Remove Words

88

89

Remove words from the trie.

90

91

```python { .api }

92

def remove_word(self, key):

93

"""

94

Remove given word from the trie.

95

96

Parameters:

97

- key: Word to remove

98

99

Returns:

100

bool: True if word was found and removed, False otherwise

101

"""

102

103

def pop(self, key):

104

"""

105

Remove word from trie and return its associated value.

106

107

Parameters:

108

- key: Word to remove

109

110

Returns:

111

The value associated with key

112

113

Raises:

114

- KeyError: If key not found

115

"""

116

```

117

118

#### Usage Examples

119

120

```python

121

automaton = ahocorasick.Automaton()

122

automaton.add_word('hello', 'greeting')

123

automaton.add_word('world', 'noun')

124

125

# Remove word and check if it existed

126

was_removed = automaton.remove_word('hello') # True

127

was_removed = automaton.remove_word('missing') # False

128

129

# Remove word and get its value

130

value = automaton.pop('world') # 'noun'

131

132

# Pop raises KeyError if key not found

133

try:

134

value = automaton.pop('missing')

135

except KeyError:

136

print("Key not found")

137

```

138

139

### Clear Automaton

140

141

Remove all words from the automaton.

142

143

```python { .api }

144

def clear(self):

145

"""

146

Remove all keys from the trie.

147

This method invalidates all iterators.

148

"""

149

```

150

151

### Build Automaton

152

153

Convert the trie to an Aho-Corasick automaton for efficient searching.

154

155

```python { .api }

156

def make_automaton(self):

157

"""

158

Finalize and create the Aho-Corasick automaton based on the keys

159

already added to the trie. After successful creation, the

160

automaton.kind attribute is set to ahocorasick.AHOCORASICK.

161

"""

162

```

163

164

#### Usage Example

165

166

```python

167

automaton = ahocorasick.Automaton()

168

169

# Add words to the trie

170

words = ['he', 'she', 'his', 'hers']

171

for i, word in enumerate(words):

172

automaton.add_word(word, i)

173

174

print(automaton.kind) # ahocorasick.TRIE

175

176

# Convert to automaton for searching

177

automaton.make_automaton()

178

179

print(automaton.kind) # ahocorasick.AHOCORASICK

180

181

# Now ready for efficient pattern search

182

text = "she sells seashells"

183

matches = list(automaton.iter(text))

184

```

185

186

### Automaton Properties

187

188

Access automaton state and configuration information.

189

190

```python { .api }

191

@property

192

def kind(self):

193

"""

194

Current state of automaton.

195

196

Returns:

197

int: One of EMPTY, TRIE, or AHOCORASICK

198

"""

199

200

@property

201

def store(self):

202

"""

203

Storage type used by automaton.

204

205

Returns:

206

int: One of STORE_ANY, STORE_INTS, or STORE_LENGTH

207

"""

208

209

def __len__(self):

210

"""

211

Number of words in automaton.

212

213

Returns:

214

int: Count of distinct words added

215

"""

216

```

217

218

### Statistics

219

220

Get detailed information about the automaton's structure and memory usage.

221

222

```python { .api }

223

def get_stats(self):

224

"""

225

Return a dictionary containing Automaton statistics.

226

227

Returns:

228

dict: Statistics with keys:

229

- nodes_count: Total number of nodes

230

- words_count: Number of distinct words (same as len(automaton))

231

- longest_word: Length of the longest word

232

- links_count: Number of edges

233

- sizeof_node: Size of single node in bytes

234

- total_size: Total size of trie in bytes

235

"""

236

```

237

238

#### Usage Example

239

240

```python

241

automaton = ahocorasick.Automaton()

242

for word in ['cat', 'car', 'card', 'care', 'careful']:

243

automaton.add_word(word, len(word))

244

245

automaton.make_automaton()

246

247

stats = automaton.get_stats()

248

print(f"Nodes: {stats['nodes_count']}")

249

print(f"Words: {stats['words_count']}")

250

print(f"Longest word: {stats['longest_word']}")

251

print(f"Memory usage: {stats['total_size']} bytes")

252

```

253

254

### Memory Size

255

256

Get the approximate memory usage of the automaton instance.

257

258

```python { .api }

259

def __sizeof__(self):

260

"""

261

Return the approximate size in bytes occupied by the Automaton instance in memory.

262

263

Returns:

264

int: Size in bytes (excludes size of associated objects when using STORE_ANY)

265

"""

266

```

267

268

### Structure Dump

269

270

Get detailed internal structure information for debugging and inspection.

271

272

```python { .api }

273

def dump(self):

274

"""

275

Return a three-tuple of lists describing the Automaton as a graph of nodes, edges, failure links.

276

277

Returns:

278

tuple: (nodes, edges, failure_links) where:

279

- nodes: list of (node_id, end_of_word_marker) pairs

280

- edges: list of (node_id, label_char, child_node_id) triples

281

- failure_links: list of (source_node_id, target_node_id) pairs

282

283

Returns None if automaton is empty.

284

"""

285

```

286

287

#### Usage Example

288

289

```python

290

automaton = ahocorasick.Automaton()

291

automaton.add_word('cat', 1)

292

automaton.add_word('car', 2)

293

automaton.make_automaton()

294

295

# Get memory usage

296

memory_size = automaton.__sizeof__()

297

print(f"Automaton uses {memory_size} bytes")

298

299

# Get internal structure for debugging

300

structure = automaton.dump()

301

if structure:

302

nodes, edges, failure_links = structure

303

print(f"Nodes: {len(nodes)}")

304

print(f"Edges: {len(edges)}")

305

print(f"Failure links: {len(failure_links)}")

306

307

# Example: print first few nodes

308

for node_id, is_end in nodes[:3]:

309

print(f"Node {node_id}: end_marker={is_end}")

310

```

311

312

## Storage Type Details

313

314

### STORE_ANY (Default)

315

- Stores arbitrary Python objects as values

316

- Value parameter is required in `add_word()`

317

- Most flexible but uses more memory

318

- Suitable for complex applications with rich metadata

319

320

### STORE_INTS

321

- Stores 32-bit integers only

322

- Value parameter is optional (defaults to insertion order)

323

- Memory efficient for simple use cases

324

- Good for basic pattern indexing

325

326

### STORE_LENGTH

327

- Automatically stores the length of each key as its value

328

- Value parameter not allowed in `add_word()`

329

- Most memory efficient

330

- Ideal when you only need to know pattern lengths

331

332

## Key Type Details

333

334

### KEY_STRING (Default)

335

- Uses string keys for pattern matching

336

- Supports Unicode or bytes depending on build configuration

337

- Standard choice for text processing

338

339

### KEY_SEQUENCE

340

- Uses sequences of integers as keys

341

- Useful for numeric pattern matching

342

- Keys can be lists, tuples, or other integer sequences