Fuzzy string matching library using Levenshtein Distance algorithms for approximate text comparison
npx @tessl/cli install tessl/pypi-thefuzz@0.22.0TheFuzz is a Python fuzzy string matching library that uses Levenshtein Distance algorithms to calculate similarities between text sequences. It provides multiple scoring strategies for approximate string comparison, from simple ratio calculations to advanced weighted algorithms that combine various matching techniques.
pip install thefuzzfrom thefuzz import fuzz
from thefuzz import process
from thefuzz import utilsIndividual function imports:
from thefuzz.fuzz import ratio, WRatio, token_sort_ratio
from thefuzz.process import extractOne, extract, dedupe
from thefuzz.utils import full_processfrom thefuzz import fuzz
from thefuzz import process
# Basic string similarity scoring
ratio = fuzz.ratio("this is a test", "this is a test!")
print(ratio) # 97
# Token-based matching (handles word order differences)
token_ratio = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(token_ratio) # 100
# Weighted ratio (combines multiple algorithms)
weighted_ratio = fuzz.WRatio("this is a test", "this is a test!")
print(weighted_ratio) # 97
# Find best match from a list of choices
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
result = process.extractOne("new york jets", choices)
print(result) # ('New York Jets', 100)
# Get multiple best matches
results = process.extract("new york", choices, limit=2)
print(results) # [('New York Jets', 90), ('New York Giants', 90)]TheFuzz builds on the high-performance rapidfuzz library while maintaining backward compatibility with the original fuzzywuzzy interface. The library is organized into three main modules:
The library uses a consistent preprocessing pipeline that normalizes strings by removing non-alphanumeric characters, converting to lowercase, and optionally forcing ASCII encoding before applying fuzzy matching algorithms.
Core fuzzy string matching functions that calculate similarity ratios between two strings using different algorithms including basic ratio, partial matching, and token-based comparisons.
def ratio(s1: str, s2: str) -> int: ...
def partial_ratio(s1: str, s2: str) -> int: ...
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...Functions for finding the best matches in collections of strings, including single and multiple match extraction, duplicate removal, and configurable scoring with custom processors.
def extractOne(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
def extract(query: str, choices, processor=None, scorer=None, limit: int = 5): ...
def extractBests(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0, limit: int = 5): ...
def extractWithoutOrder(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
def dedupe(contains_dupes: list, threshold: int = 70, scorer=None): ...String Processing and Extraction
Utility functions for string preprocessing and normalization, including ASCII conversion and comprehensive text cleaning that removes non-alphanumeric characters and normalizes whitespace.
def full_process(s: str, force_ascii: bool = False) -> str: ...
def ascii_only(s: str) -> str: ...from typing import Callable, Union, Tuple, List, Dict, Any, Generator, TypeVar, Sequence
from collections.abc import Mapping
# Core type aliases from process module
ChoicesT = Union[Mapping[str, str], Sequence[str]]
T = TypeVar('T')
ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
ScorerT = Callable[[str, str, bool, bool], int]
# Additional type aliases for better understanding
Scorer = Callable[[str, str], int]
Processor = Callable[[str], str]
Choice = Union[str, Tuple[Any, ...], Dict[str, Any]]
Choices = Union[List[Choice], Dict[Any, Choice]]