Tessl Tile for pypi/thefuzz@0.22.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md string-processing.md string-similarity.md string-utilities.md

index.mddocs/

0
# TheFuzz
1

2
TheFuzz is a Python fuzzy string matching library that uses Levenshtein Distance algorithms to calculate similarities between text sequences. It provides multiple scoring strategies for approximate string comparison, from simple ratio calculations to advanced weighted algorithms that combine various matching techniques.
3

4
## Package Information
5

6
- **Package Name**: thefuzz
7
- **Language**: Python
8
- **Installation**: `pip install thefuzz`
9
- **Minimum Python Version**: 3.8+
10

11
## Core Imports
12

13
```python
14
from thefuzz import fuzz
15
from thefuzz import process
16
from thefuzz import utils
17
```
18

19
Individual function imports:
20

21
```python
22
from thefuzz.fuzz import ratio, WRatio, token_sort_ratio
23
from thefuzz.process import extractOne, extract, dedupe
24
from thefuzz.utils import full_process
25
```
26

27
## Basic Usage
28

29
```python
30
from thefuzz import fuzz
31
from thefuzz import process
32

33
# Basic string similarity scoring
34
ratio = fuzz.ratio("this is a test", "this is a test!")
35
print(ratio)  # 97
36

37
# Token-based matching (handles word order differences)
38
token_ratio = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
39
print(token_ratio)  # 100
40

41
# Weighted ratio (combines multiple algorithms)
42
weighted_ratio = fuzz.WRatio("this is a test", "this is a test!")
43
print(weighted_ratio)  # 97
44

45
# Find best match from a list of choices
46
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
47
result = process.extractOne("new york jets", choices)
48
print(result)  # ('New York Jets', 100)
49

50
# Get multiple best matches
51
results = process.extract("new york", choices, limit=2)
52
print(results)  # [('New York Jets', 90), ('New York Giants', 90)]
53
```
54

55
## Architecture
56

57
TheFuzz builds on the high-performance rapidfuzz library while maintaining backward compatibility with the original fuzzywuzzy interface. The library is organized into three main modules:
58

59
- **fuzz**: Core string similarity scoring functions using various algorithms
60
- **process**: Functions for finding best matches in collections of strings  
61
- **utils**: String preprocessing and normalization utilities
62

63
The library uses a consistent preprocessing pipeline that normalizes strings by removing non-alphanumeric characters, converting to lowercase, and optionally forcing ASCII encoding before applying fuzzy matching algorithms.
64

65
## Capabilities
66

67
### String Similarity Scoring
68

69
Core fuzzy string matching functions that calculate similarity ratios between two strings using different algorithms including basic ratio, partial matching, and token-based comparisons.
70

71
```python { .api }
72
def ratio(s1: str, s2: str) -> int: ...
73
def partial_ratio(s1: str, s2: str) -> int: ...
74
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
75
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
76
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
77
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
78
```
79

80
[String Similarity Scoring](./string-similarity.md)
81

82
### String Processing and Extraction
83

84
Functions for finding the best matches in collections of strings, including single and multiple match extraction, duplicate removal, and configurable scoring with custom processors.
85

86
```python { .api }
87
def extractOne(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
88
def extract(query: str, choices, processor=None, scorer=None, limit: int = 5): ...
89
def extractBests(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0, limit: int = 5): ...
90
def extractWithoutOrder(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
91
def dedupe(contains_dupes: list, threshold: int = 70, scorer=None): ...
92
```
93

94
[String Processing and Extraction](./string-processing.md)
95

96
### String Utilities
97

98
Utility functions for string preprocessing and normalization, including ASCII conversion and comprehensive text cleaning that removes non-alphanumeric characters and normalizes whitespace.
99

100
```python { .api }
101
def full_process(s: str, force_ascii: bool = False) -> str: ...
102
def ascii_only(s: str) -> str: ...
103
```
104

105
[String Utilities](./string-utilities.md)
106

107
## Types
108

109
```python { .api }
110
from typing import Callable, Union, Tuple, List, Dict, Any, Generator, TypeVar, Sequence
111
from collections.abc import Mapping
112

113
# Core type aliases from process module
114
ChoicesT = Union[Mapping[str, str], Sequence[str]]
115
T = TypeVar('T')
116
ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
117
ScorerT = Callable[[str, str, bool, bool], int]
118

119
# Additional type aliases for better understanding
120
Scorer = Callable[[str, str], int]
121
Processor = Callable[[str], str]
122
Choice = Union[str, Tuple[Any, ...], Dict[str, Any]]
123
Choices = Union[List[Choice], Dict[Any, Choice]]
124
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/