0
# Fuzzy String Algorithms
1
2
Core fuzzy string comparison functions that implement various algorithms for measuring string similarity. All functions return integer scores from 0-100, where 100 indicates identical strings and 0 indicates no similarity.
3
4
## Capabilities
5
6
### Basic Ratio Matching
7
8
Standard string similarity using sequence matching algorithms.
9
10
```python { .api }
11
def ratio(s1: str, s2: str) -> int:
12
"""
13
Calculate similarity ratio between two strings.
14
15
Returns:
16
int: Similarity score 0-100
17
"""
18
```
19
20
**Usage Example:**
21
```python
22
from fuzzywuzzy import fuzz
23
24
score = fuzz.ratio("this is a test", "this is a test!")
25
print(score) # 97
26
27
score = fuzz.ratio("fuzzy wuzzy", "wuzzy fuzzy")
28
print(score) # 91
29
```
30
31
### Partial Ratio Matching
32
33
Finds the similarity of the most similar substring, useful when one string is contained within another.
34
35
```python { .api }
36
def partial_ratio(s1: str, s2: str) -> int:
37
"""
38
Return the ratio of the most similar substring as a number between 0 and 100.
39
40
Returns:
41
int: Partial similarity score 0-100
42
"""
43
```
44
45
**Usage Example:**
46
```python
47
from fuzzywuzzy import fuzz
48
49
score = fuzz.partial_ratio("this is a test", "this is a test!")
50
print(score) # 100
51
52
score = fuzz.partial_ratio("fuzzy wuzzy", "wuzzy")
53
print(score) # 100
54
```
55
56
### Token Sort Ratio
57
58
Compares strings after sorting tokens alphabetically, handling word order variations.
59
60
```python { .api }
61
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
62
"""
63
Return similarity between 0 and 100 after sorting tokens.
64
65
Parameters:
66
s1: First string to compare
67
s2: Second string to compare
68
force_ascii: Force ASCII conversion (default True)
69
full_process: Apply full string processing (default True)
70
71
Returns:
72
int: Token sort similarity score 0-100
73
"""
74
```
75
76
**Usage Example:**
77
```python
78
from fuzzywuzzy import fuzz
79
80
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
81
print(score) # 100
82
83
score = fuzz.token_sort_ratio("new york mets", "mets new york")
84
print(score) # 100
85
```
86
87
### Partial Token Sort Ratio
88
89
Combines partial ratio with token sorting for maximum flexibility in word order and substring matching.
90
91
```python { .api }
92
def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
93
"""
94
Return partial ratio of sorted tokens between 0 and 100.
95
96
Parameters:
97
s1: First string to compare
98
s2: Second string to compare
99
force_ascii: Force ASCII conversion (default True)
100
full_process: Apply full string processing (default True)
101
102
Returns:
103
int: Partial token sort similarity score 0-100
104
"""
105
```
106
107
### Token Set Ratio
108
109
Uses set theory to handle token intersections and differences, ideal for strings with repeated words.
110
111
```python { .api }
112
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
113
"""
114
Return similarity using token set comparison between 0 and 100.
115
116
Compares the intersection and differences of token sets to handle
117
repeated words and partial matches effectively.
118
119
Parameters:
120
s1: First string to compare
121
s2: Second string to compare
122
force_ascii: Force ASCII conversion (default True)
123
full_process: Apply full string processing (default True)
124
125
Returns:
126
int: Token set similarity score 0-100
127
"""
128
```
129
130
**Usage Example:**
131
```python
132
from fuzzywuzzy import fuzz
133
134
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
135
print(score) # 100
136
137
score = fuzz.token_set_ratio("new york yankees", "yankees new york")
138
print(score) # 100
139
```
140
141
### Partial Token Set Ratio
142
143
Combines partial ratio with token set comparison for maximum robustness.
144
145
```python { .api }
146
def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
147
"""
148
Return partial ratio using token set comparison between 0 and 100.
149
150
Parameters:
151
s1: First string to compare
152
s2: Second string to compare
153
force_ascii: Force ASCII conversion (default True)
154
full_process: Apply full string processing (default True)
155
156
Returns:
157
int: Partial token set similarity score 0-100
158
"""
159
```
160
161
### Quick Ratio Functions
162
163
Fast ratio calculation with optional preprocessing.
164
165
```python { .api }
166
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
167
"""
168
Quick ratio comparison between two strings.
169
170
Parameters:
171
s1: First string to compare
172
s2: Second string to compare
173
force_ascii: Allow only ASCII characters (default True)
174
full_process: Process inputs to avoid double processing (default True)
175
176
Returns:
177
int: Quick similarity score 0-100
178
"""
179
180
def UQRatio(s1: str, s2: str, full_process: bool = True) -> int:
181
"""
182
Unicode quick ratio - QRatio with force_ascii=False.
183
184
Parameters:
185
s1: First string to compare
186
s2: Second string to compare
187
full_process: Process inputs (default True)
188
189
Returns:
190
int: Unicode quick similarity score 0-100
191
"""
192
```
193
194
### Weighted Ratio Functions
195
196
Intelligent combination of multiple algorithms for optimal results.
197
198
```python { .api }
199
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
200
"""
201
Return weighted similarity between 0 and 100 using multiple algorithms.
202
203
Automatically selects the best combination of ratio algorithms based on
204
string length differences and applies appropriate scaling factors.
205
206
Algorithm selection:
207
- Uses partial algorithms when one string is >1.5x longer than the other
208
- Applies 0.9 scaling for partial results, 0.6 for very long differences
209
- Uses token-based algorithms with 0.95 scaling
210
- Returns the highest score from all applicable algorithms
211
212
Parameters:
213
s1: First string to compare
214
s2: Second string to compare
215
force_ascii: Allow only ASCII characters (default True)
216
full_process: Process inputs (default True)
217
218
Returns:
219
int: Weighted similarity score 0-100
220
"""
221
222
def UWRatio(s1: str, s2: str, full_process: bool = True) -> int:
223
"""
224
Unicode weighted ratio - WRatio with force_ascii=False.
225
226
Parameters:
227
s1: First string to compare
228
s2: Second string to compare
229
full_process: Process inputs (default True)
230
231
Returns:
232
int: Unicode weighted similarity score 0-100
233
"""
234
```
235
236
**Usage Example:**
237
```python
238
from fuzzywuzzy import fuzz
239
240
# WRatio automatically selects the best algorithm
241
score = fuzz.WRatio("new york yankees", "yankees")
242
print(score) # Uses partial algorithms due to length difference
243
244
score = fuzz.WRatio("new york mets", "new york yankees")
245
print(score) # Uses token algorithms for similar-length strings
246
```