Tessl Tile for pypi/fuzzywuzzy@0.18.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

fuzzy-algorithms.md index.md string-processing.md utilities.md

string-processing.mddocs/

0
# String Collection Processing
1

2
Functions for processing collections of strings and finding best matches using fuzzy string matching. These functions enable searching, ranking, and deduplication operations on lists or dictionaries of strings.
3

4
## Default Settings
5

6
```python { .api }
7
default_scorer = fuzz.WRatio        # Default scoring function
8
default_processor = utils.full_process  # Default string preprocessing function
9
```
10

11
## Capabilities
12

13
### Single Best Match Extraction
14

15
Find the single best match above a score threshold in a collection of choices.
16

17
```python { .api }
18
def extractOne(query: str, choices, processor=default_processor, scorer=default_scorer, score_cutoff: int = 0):
19
    """
20
    Find the single best match above a score in a list of choices.
21
    
22
    Parameters:
23
        query: String to match against
24
        choices: List or dict of choices to search through
25
        processor: Function to preprocess strings before matching
26
        scorer: Function to score matches (default: fuzz.WRatio)
27
        score_cutoff: Minimum score threshold (default: 0)
28
        
29
    Returns:
30
        tuple: (match, score) if found, None if no match above cutoff
31
        tuple: (match, score, key) if choices is a dictionary
32
    """
33
```
34

35
**Usage Example:**
36
```python
37
from fuzzywuzzy import process
38

39
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
40
result = process.extractOne("new york jets", choices)
41
print(result)  # ("New York Jets", 100)
42

43
# With score cutoff
44
result = process.extractOne("new york", choices, score_cutoff=80)
45
print(result)  # ("New York Jets", 90) or ("New York Giants", 90)
46

47
# With dictionary
48
choices_dict = {"team1": "Atlanta Falcons", "team2": "New York Jets"}
49
result = process.extractOne("jets", choices_dict)
50
print(result)  # ("New York Jets", 90, "team2")
51
```
52

53
### Multiple Match Extraction
54

55
Extract multiple best matches from a collection with optional limits.
56

57
```python { .api }
58
def extract(query: str, choices, processor=default_processor, scorer=default_scorer, limit: int = 5):
59
    """
60
    Select the best matches in a list or dictionary of choices.
61
    
62
    Parameters:
63
        query: String to match against
64
        choices: List or dict of choices to search through
65
        processor: Function to preprocess strings before matching
66
        scorer: Function to score matches (default: fuzz.WRatio)
67
        limit: Maximum number of results to return (default: 5)
68
        
69
    Returns:
70
        list: List of (match, score) tuples sorted by score descending
71
        list: List of (match, score, key) tuples if choices is a dictionary
72
    """
73
```
74

75
**Usage Example:**
76
```python
77
from fuzzywuzzy import process
78

79
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
80
results = process.extract("new york", choices, limit=2)
81
print(results)  # [("New York Jets", 90), ("New York Giants", 90)]
82

83
# Get all matches
84
all_results = process.extract("new", choices, limit=None)
85
print(all_results)  # All matches sorted by score
86
```
87

88
### Threshold-Based Multiple Extraction
89

90
Extract multiple matches above a score threshold with optional limits.
91

92
```python { .api }
93
def extractBests(query: str, choices, processor=default_processor, scorer=default_scorer, score_cutoff: int = 0, limit: int = 5):
94
    """
95
    Get a list of the best matches above a score threshold.
96
    
97
    Parameters:
98
        query: String to match against
99
        choices: List or dict of choices to search through
100
        processor: Function to preprocess strings before matching
101
        scorer: Function to score matches (default: fuzz.WRatio)
102
        score_cutoff: Minimum score threshold (default: 0)
103
        limit: Maximum number of results to return (default: 5)
104
        
105
    Returns:
106
        list: List of (match, score) tuples above cutoff, sorted by score
107
    """
108
```
109

110
**Usage Example:**
111
```python
112
from fuzzywuzzy import process
113

114
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
115
results = process.extractBests("new", choices, score_cutoff=50, limit=3)
116
print(results)  # Only matches scoring 50 or higher
117
```
118

119
### Unordered Extraction Generator
120

121
Generator function that yields matches without sorting, useful for large datasets.
122

123
```python { .api }
124
def extractWithoutOrder(query: str, choices, processor=default_processor, scorer=default_scorer, score_cutoff: int = 0):
125
    """
126
    Generator yielding best matches without ordering, for memory efficiency.
127
    
128
    Parameters:
129
        query: String to match against
130
        choices: List or dict of choices to search through
131
        processor: Function to preprocess strings before matching
132
        scorer: Function to score matches (default: fuzz.WRatio)
133
        score_cutoff: Minimum score threshold (default: 0)
134
        
135
    Yields:
136
        tuple: (match, score) for list choices
137
        tuple: (match, score, key) for dictionary choices
138
    """
139
```
140

141
**Usage Example:**
142
```python
143
from fuzzywuzzy import process
144

145
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
146
for match in process.extractWithoutOrder("new", choices, score_cutoff=60):
147
    print(match)  # Yields matches as found, not sorted
148
```
149

150
### Fuzzy Deduplication
151

152
Remove duplicates from a list using fuzzy matching to identify similar items.
153

154
```python { .api }
155
def dedupe(contains_dupes: list, threshold: int = 70, scorer=fuzz.token_set_ratio):
156
    """
157
    Remove duplicates from a list using fuzzy matching.
158
    
159
    Uses fuzzy matching to identify duplicates that score above the threshold.
160
    For each group of duplicates, returns the longest item (most information),
161
    breaking ties alphabetically.
162
    
163
    Parameters:
164
        contains_dupes: List of strings that may contain duplicates
165
        threshold: Score threshold for considering items duplicates (default: 70)
166
        scorer: Function to score similarity (default: fuzz.token_set_ratio)
167
        
168
    Returns:
169
        list: Deduplicated list with longest representative from each group
170
    """
171
```
172

173
**Usage Example:**
174
```python
175
from fuzzywuzzy import process
176

177
duplicates = [
178
    'Frodo Baggin', 
179
    'Frodo Baggins', 
180
    'F. Baggins', 
181
    'Samwise G.', 
182
    'Gandalf', 
183
    'Bilbo Baggins'
184
]
185

186
deduped = process.dedupe(duplicates)
187
print(deduped)  # ['Frodo Baggins', 'Samwise G.', 'Bilbo Baggins', 'Gandalf']
188

189
# Lower threshold finds more duplicates
190
deduped_aggressive = process.dedupe(duplicates, threshold=50)
191
print(deduped_aggressive)  # Even fewer items returned
192
```
193

194
## Custom Processors and Scorers
195

196
You can provide custom processing and scoring functions:
197

198
**Usage Example:**
199
```python
200
from fuzzywuzzy import process, fuzz
201

202
# Custom processor that only looks at first word
203
def first_word_processor(s):
204
    return s.split()[0] if s else ""
205

206
# Custom scorer that uses partial ratio
207
choices = ["John Smith", "Jane Smith", "Bob Johnson"]
208
result = process.extractOne(
209
    "John", 
210
    choices, 
211
    processor=first_word_processor,
212
    scorer=fuzz.partial_ratio
213
)
214
print(result)  # ("John Smith", 100)
215

216
# No processing
217
result = process.extractOne(
218
    "JOHN SMITH", 
219
    choices, 
220
    processor=None,  # No preprocessing
221
    scorer=fuzz.ratio
222
)
223
print(result)  # Lower score due to case mismatch
224
```

Version

Tile

Files

string-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

string-processing.mddocs/