Fast and memory efficient library for exact or approximate multi-pattern string search using the Aho-Corasick algorithm
npx @tessl/cli install tessl/pypi-pyahocorasick@2.2.0A fast and memory efficient library for exact or approximate multi-pattern string search using the Aho-Corasick algorithm. With pyahocorasick, you can find multiple key string occurrences at once in input text, making it ideal for applications requiring high-throughput pattern matching such as bioinformatics, log parsing, content filtering, and data mining.
pip install pyahocorasickimport ahocorasickimport ahocorasick
# Create an automaton
automaton = ahocorasick.Automaton()
# Add words to the trie
for idx, key in enumerate(['he', 'she', 'his', 'hers']):
automaton.add_word(key, (idx, key))
# Convert to automaton for searching
automaton.make_automaton()
# Search for patterns in text
text = "she sells seashells by the seashore"
for end_index, (insert_order, original_string) in automaton.iter(text):
start_index = end_index - len(original_string) + 1
print(f"Found '{original_string}' at positions {start_index}-{end_index}")pyahocorasick implements a two-stage pattern matching system:
The library supports flexible value storage (arbitrary objects, integers, or automatic length calculation) and can operate on both Unicode strings and byte sequences depending on build configuration.
Core functionality for creating and managing Aho-Corasick automata, including adding/removing patterns, configuring storage types, and converting tries to search-ready automatons.
class Automaton:
def __init__(self, store=ahocorasick.STORE_ANY, key_type=ahocorasick.KEY_STRING): ...
def add_word(self, key, value=None): ...
def remove_word(self, key): ...
def make_automaton(self): ...Efficient multi-pattern search operations using the built automaton, supporting various search modes including standard iteration, longest-match iteration, and callback-based processing.
def iter(self, string, start=0, end=None, ignore_white_space=False): ...
def iter_long(self, string, start=0, end=None): ...
def find_all(self, string, callback, start=0, end=None): ...Dict-like operations for accessing stored patterns and values, including existence checking, value retrieval, and iteration over keys, values, and items with optional filtering.
def get(self, key, default=None): ...
def exists(self, key): ...
def keys(self, prefix=None, wildcard=None, how=ahocorasick.MATCH_AT_LEAST_PREFIX): ...
def values(self): ...
def items(self): ...Save and load automaton instances to/from disk with support for custom serialization functions for arbitrary object storage and efficient built-in serialization for integer storage.
def save(self, path, serializer=None): ...
def load(path, deserializer=None): ...STORE_ANY # Store arbitrary Python objects (default)
STORE_INTS # Store integers only
STORE_LENGTH # Store key lengths automaticallyKEY_STRING # String keys (default)
KEY_SEQUENCE # Integer sequence keysEMPTY # No words added
TRIE # Trie built but not converted to automaton
AHOCORASICK # Full automaton ready for searchingMATCH_EXACT_LENGTH # Exact length matching for wildcard patterns
MATCH_AT_LEAST_PREFIX # At least prefix length matching (default)
MATCH_AT_MOST_PREFIX # At most prefix length matchingunicode # Integer flag (0 or 1) indicating Unicode build supportThe library raises standard Python exceptions: