Fuzzy string matching library using Levenshtein Distance algorithms for approximate text comparison
npx @tessl/cli install tessl/pypi-thefuzz@0.22.00
# TheFuzz
1
2
TheFuzz is a Python fuzzy string matching library that uses Levenshtein Distance algorithms to calculate similarities between text sequences. It provides multiple scoring strategies for approximate string comparison, from simple ratio calculations to advanced weighted algorithms that combine various matching techniques.
3
4
## Package Information
5
6
- **Package Name**: thefuzz
7
- **Language**: Python
8
- **Installation**: `pip install thefuzz`
9
- **Minimum Python Version**: 3.8+
10
11
## Core Imports
12
13
```python
14
from thefuzz import fuzz
15
from thefuzz import process
16
from thefuzz import utils
17
```
18
19
Individual function imports:
20
21
```python
22
from thefuzz.fuzz import ratio, WRatio, token_sort_ratio
23
from thefuzz.process import extractOne, extract, dedupe
24
from thefuzz.utils import full_process
25
```
26
27
## Basic Usage
28
29
```python
30
from thefuzz import fuzz
31
from thefuzz import process
32
33
# Basic string similarity scoring
34
ratio = fuzz.ratio("this is a test", "this is a test!")
35
print(ratio) # 97
36
37
# Token-based matching (handles word order differences)
38
token_ratio = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
39
print(token_ratio) # 100
40
41
# Weighted ratio (combines multiple algorithms)
42
weighted_ratio = fuzz.WRatio("this is a test", "this is a test!")
43
print(weighted_ratio) # 97
44
45
# Find best match from a list of choices
46
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
47
result = process.extractOne("new york jets", choices)
48
print(result) # ('New York Jets', 100)
49
50
# Get multiple best matches
51
results = process.extract("new york", choices, limit=2)
52
print(results) # [('New York Jets', 90), ('New York Giants', 90)]
53
```
54
55
## Architecture
56
57
TheFuzz builds on the high-performance rapidfuzz library while maintaining backward compatibility with the original fuzzywuzzy interface. The library is organized into three main modules:
58
59
- **fuzz**: Core string similarity scoring functions using various algorithms
60
- **process**: Functions for finding best matches in collections of strings
61
- **utils**: String preprocessing and normalization utilities
62
63
The library uses a consistent preprocessing pipeline that normalizes strings by removing non-alphanumeric characters, converting to lowercase, and optionally forcing ASCII encoding before applying fuzzy matching algorithms.
64
65
## Capabilities
66
67
### String Similarity Scoring
68
69
Core fuzzy string matching functions that calculate similarity ratios between two strings using different algorithms including basic ratio, partial matching, and token-based comparisons.
70
71
```python { .api }
72
def ratio(s1: str, s2: str) -> int: ...
73
def partial_ratio(s1: str, s2: str) -> int: ...
74
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
75
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
76
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
77
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
78
```
79
80
[String Similarity Scoring](./string-similarity.md)
81
82
### String Processing and Extraction
83
84
Functions for finding the best matches in collections of strings, including single and multiple match extraction, duplicate removal, and configurable scoring with custom processors.
85
86
```python { .api }
87
def extractOne(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
88
def extract(query: str, choices, processor=None, scorer=None, limit: int = 5): ...
89
def extractBests(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0, limit: int = 5): ...
90
def extractWithoutOrder(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
91
def dedupe(contains_dupes: list, threshold: int = 70, scorer=None): ...
92
```
93
94
[String Processing and Extraction](./string-processing.md)
95
96
### String Utilities
97
98
Utility functions for string preprocessing and normalization, including ASCII conversion and comprehensive text cleaning that removes non-alphanumeric characters and normalizes whitespace.
99
100
```python { .api }
101
def full_process(s: str, force_ascii: bool = False) -> str: ...
102
def ascii_only(s: str) -> str: ...
103
```
104
105
[String Utilities](./string-utilities.md)
106
107
## Types
108
109
```python { .api }
110
from typing import Callable, Union, Tuple, List, Dict, Any, Generator, TypeVar, Sequence
111
from collections.abc import Mapping
112
113
# Core type aliases from process module
114
ChoicesT = Union[Mapping[str, str], Sequence[str]]
115
T = TypeVar('T')
116
ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
117
ScorerT = Callable[[str, str, bool, bool], int]
118
119
# Additional type aliases for better understanding
120
Scorer = Callable[[str, str], int]
121
Processor = Callable[[str], str]
122
Choice = Union[str, Tuple[Any, ...], Dict[str, Any]]
123
Choices = Union[List[Choice], Dict[Any, Choice]]
124
```