or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

automaton-construction.mddictionary-interface.mdindex.mdpattern-search.mdserialization.md

index.mddocs/

0

# pyahocorasick

1

2

A fast and memory efficient library for exact or approximate multi-pattern string search using the Aho-Corasick algorithm. With pyahocorasick, you can find multiple key string occurrences at once in input text, making it ideal for applications requiring high-throughput pattern matching such as bioinformatics, log parsing, content filtering, and data mining.

3

4

## Package Information

5

6

- **Package Name**: pyahocorasick

7

- **Language**: Python (C extension)

8

- **Installation**: `pip install pyahocorasick`

9

10

## Core Imports

11

12

```python

13

import ahocorasick

14

```

15

16

## Basic Usage

17

18

```python

19

import ahocorasick

20

21

# Create an automaton

22

automaton = ahocorasick.Automaton()

23

24

# Add words to the trie

25

for idx, key in enumerate(['he', 'she', 'his', 'hers']):

26

automaton.add_word(key, (idx, key))

27

28

# Convert to automaton for searching

29

automaton.make_automaton()

30

31

# Search for patterns in text

32

text = "she sells seashells by the seashore"

33

for end_index, (insert_order, original_string) in automaton.iter(text):

34

start_index = end_index - len(original_string) + 1

35

print(f"Found '{original_string}' at positions {start_index}-{end_index}")

36

```

37

38

## Architecture

39

40

pyahocorasick implements a two-stage pattern matching system:

41

42

- **Trie Stage**: Dictionary-like structure for storing patterns with associated values

43

- **Automaton Stage**: Aho-Corasick finite state machine for efficient multi-pattern search

44

45

The library supports flexible value storage (arbitrary objects, integers, or automatic length calculation) and can operate on both Unicode strings and byte sequences depending on build configuration.

46

47

## Capabilities

48

49

### Automaton Construction

50

51

Core functionality for creating and managing Aho-Corasick automata, including adding/removing patterns, configuring storage types, and converting tries to search-ready automatons.

52

53

```python { .api }

54

class Automaton:

55

def __init__(self, store=ahocorasick.STORE_ANY, key_type=ahocorasick.KEY_STRING): ...

56

def add_word(self, key, value=None): ...

57

def remove_word(self, key): ...

58

def make_automaton(self): ...

59

```

60

61

[Automaton Construction](./automaton-construction.md)

62

63

### Pattern Search

64

65

Efficient multi-pattern search operations using the built automaton, supporting various search modes including standard iteration, longest-match iteration, and callback-based processing.

66

67

```python { .api }

68

def iter(self, string, start=0, end=None, ignore_white_space=False): ...

69

def iter_long(self, string, start=0, end=None): ...

70

def find_all(self, string, callback, start=0, end=None): ...

71

```

72

73

[Pattern Search](./pattern-search.md)

74

75

### Dictionary Interface

76

77

Dict-like operations for accessing stored patterns and values, including existence checking, value retrieval, and iteration over keys, values, and items with optional filtering.

78

79

```python { .api }

80

def get(self, key, default=None): ...

81

def exists(self, key): ...

82

def keys(self, prefix=None, wildcard=None, how=ahocorasick.MATCH_AT_LEAST_PREFIX): ...

83

def values(self): ...

84

def items(self): ...

85

```

86

87

[Dictionary Interface](./dictionary-interface.md)

88

89

### Serialization

90

91

Save and load automaton instances to/from disk with support for custom serialization functions for arbitrary object storage and efficient built-in serialization for integer storage.

92

93

```python { .api }

94

def save(self, path, serializer=None): ...

95

def load(path, deserializer=None): ...

96

```

97

98

[Serialization](./serialization.md)

99

100

## Constants

101

102

### Storage Types

103

104

```python { .api }

105

STORE_ANY # Store arbitrary Python objects (default)

106

STORE_INTS # Store integers only

107

STORE_LENGTH # Store key lengths automatically

108

```

109

110

### Key Types

111

112

```python { .api }

113

KEY_STRING # String keys (default)

114

KEY_SEQUENCE # Integer sequence keys

115

```

116

117

### Automaton States

118

119

```python { .api }

120

EMPTY # No words added

121

TRIE # Trie built but not converted to automaton

122

AHOCORASICK # Full automaton ready for searching

123

```

124

125

### Pattern Matching Types

126

127

```python { .api }

128

MATCH_EXACT_LENGTH # Exact length matching for wildcard patterns

129

MATCH_AT_LEAST_PREFIX # At least prefix length matching (default)

130

MATCH_AT_MOST_PREFIX # At most prefix length matching

131

```

132

133

### Build Configuration

134

135

```python { .api }

136

unicode # Integer flag (0 or 1) indicating Unicode build support

137

```

138

139

## Error Handling

140

141

The library raises standard Python exceptions:

142

143

- **ValueError**: Invalid arguments, wrong store/key types, malformed data

144

- **TypeError**: Wrong argument types

145

- **KeyError**: Key not found operations

146

- **AttributeError**: Calling search methods before building automaton

147

- **IndexError**: Invalid range parameters