0
# Fuzzy String Matching
1
2
High-level string similarity functions that provide intuitive percentage-based similarity scores (0-100) for different matching scenarios. These functions form the core of RapidFuzz's fuzzy matching capabilities.
3
4
## Capabilities
5
6
### Basic Ratio
7
8
Calculates the normalized similarity between two strings using edit distance.
9
10
```python { .api }
11
def ratio(
12
s1: Sequence[Hashable],
13
s2: Sequence[Hashable],
14
*,
15
processor: Callable[[str], str] | None = None,
16
score_cutoff: float | None = 0
17
) -> float
18
```
19
20
**Parameters:**
21
- `s1`: First string to compare
22
- `s2`: Second string to compare
23
- `processor`: Optional preprocessing function (e.g., `utils.default_process`)
24
- `score_cutoff`: Minimum score threshold (0-100), returns 0 if below
25
26
**Returns:** Similarity score from 0-100 (100 = identical)
27
28
**Usage Example:**
29
```python
30
from rapidfuzz import fuzz
31
32
score = fuzz.ratio("this is a test", "this is a test!")
33
print(score) # 96.55
34
35
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
36
print(score) # 90.91
37
```
38
39
### Partial Ratio
40
41
Finds the best matching substring within the longer string, useful when one string is contained within another.
42
43
```python { .api }
44
def partial_ratio(
45
s1: Sequence[Hashable],
46
s2: Sequence[Hashable],
47
*,
48
processor: Callable[[str], str] | None = None,
49
score_cutoff: float | None = 0
50
) -> float
51
```
52
53
**Returns:** Best substring similarity score from 0-100
54
55
**Usage Example:**
56
```python
57
from rapidfuzz import fuzz
58
59
# Perfect match when shorter string is contained in longer
60
score = fuzz.partial_ratio("this is a test", "this is a test!")
61
print(score) # 100.0
62
63
score = fuzz.partial_ratio("needle", "haystack with needle in it")
64
print(score) # 100.0
65
```
66
67
### Partial Ratio with Alignment
68
69
Same as partial_ratio but also returns alignment information showing where the match occurred.
70
71
```python { .api }
72
def partial_ratio_alignment(
73
s1: Sequence[Hashable],
74
s2: Sequence[Hashable],
75
*,
76
processor: Callable[[str], str] | None = None,
77
score_cutoff: float | None = 0
78
) -> ScoreAlignment | None
79
```
80
81
**Returns:** ScoreAlignment object with score and position information, or None if below cutoff
82
83
### Token Sort Ratio
84
85
Sorts the tokens (words) in both strings before comparing, useful for strings with different word orders.
86
87
```python { .api }
88
def token_sort_ratio(
89
s1: Sequence[Hashable],
90
s2: Sequence[Hashable],
91
*,
92
processor: Callable[[str], str] | None = None,
93
score_cutoff: float | None = 0
94
) -> float
95
```
96
97
**Usage Example:**
98
```python
99
from rapidfuzz import fuzz
100
101
# Different word order
102
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
103
print(score) # 90.91
104
105
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
106
print(score) # 100.0
107
```
108
109
### Token Set Ratio
110
111
Compares strings using set-based operations on tokens, excellent for handling duplicates and subsets.
112
113
```python { .api }
114
def token_set_ratio(
115
s1: Sequence[Hashable],
116
s2: Sequence[Hashable],
117
*,
118
processor: Callable[[str], str] | None = None,
119
score_cutoff: float | None = 0
120
) -> float
121
```
122
123
**Usage Example:**
124
```python
125
from rapidfuzz import fuzz
126
127
# Handles duplicates and subsets well
128
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
129
print(score) # 100.0
130
131
score = fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")
132
print(score) # 100.0 (subset)
133
```
134
135
### Token Ratio
136
137
Combines token_sort_ratio and token_set_ratio, choosing the higher score.
138
139
```python { .api }
140
def token_ratio(
141
s1: Sequence[Hashable],
142
s2: Sequence[Hashable],
143
*,
144
processor: Callable[[str], str] | None = None,
145
score_cutoff: float | None = 0
146
) -> float
147
```
148
149
### Partial Token Functions
150
151
Partial versions of token-based ratios that find the best matching subsequence.
152
153
```python { .api }
154
def partial_token_sort_ratio(
155
s1: Sequence[Hashable],
156
s2: Sequence[Hashable],
157
*,
158
processor: Callable[[str], str] | None = None,
159
score_cutoff: float | None = 0
160
) -> float
161
162
def partial_token_set_ratio(
163
s1: Sequence[Hashable],
164
s2: Sequence[Hashable],
165
*,
166
processor: Callable[[str], str] | None = None,
167
score_cutoff: float | None = 0
168
) -> float
169
170
def partial_token_ratio(
171
s1: Sequence[Hashable],
172
s2: Sequence[Hashable],
173
*,
174
processor: Callable[[str], str] | None = None,
175
score_cutoff: float | None = 0
176
) -> float
177
```
178
179
### Weighted Ratio (WRatio)
180
181
Intelligent combination of multiple ratio algorithms that automatically selects the best approach based on string characteristics. This is the recommended general-purpose function.
182
183
```python { .api }
184
def WRatio(
185
s1: Sequence[Hashable],
186
s2: Sequence[Hashable],
187
*,
188
processor: Callable[[str], str] | None = None,
189
score_cutoff: float | None = 0
190
) -> float
191
```
192
193
**Usage Example:**
194
```python
195
from rapidfuzz import fuzz, utils
196
197
score = fuzz.WRatio("this is a test", "this is a new test!!!")
198
print(score) # 85.5
199
200
# With preprocessing to handle case and punctuation
201
score = fuzz.WRatio("this is a test", "this is a new test!!!",
202
processor=utils.default_process)
203
print(score) # 95.0
204
205
score = fuzz.WRatio("this is a word", "THIS IS A WORD",
206
processor=utils.default_process)
207
print(score) # 100.0
208
```
209
210
### Quick Ratio (QRatio)
211
212
Fast approximate matching algorithm that provides good performance with reasonable accuracy.
213
214
```python { .api }
215
def QRatio(
216
s1: Sequence[Hashable],
217
s2: Sequence[Hashable],
218
*,
219
processor: Callable[[str], str] | None = None,
220
score_cutoff: float | None = 0
221
) -> float
222
```
223
224
**Usage Example:**
225
```python
226
from rapidfuzz import fuzz, utils
227
228
score = fuzz.QRatio("this is a test", "this is a new test!!!")
229
print(score) # 80.0
230
231
score = fuzz.QRatio("this is a test", "this is a new test!!!",
232
processor=utils.default_process)
233
print(score) # 87.5
234
```
235
236
## Usage Patterns
237
238
### Choosing the Right Function
239
240
- **`WRatio`**: Best general-purpose choice, intelligently combines multiple algorithms
241
- **`ratio`**: Basic similarity when string length and order matter
242
- **`partial_ratio`**: When looking for substrings or one string contained in another
243
- **`token_sort_ratio`**: When word order doesn't matter
244
- **`token_set_ratio`**: When handling duplicates or subset relationships
245
- **`QRatio`**: When performance is critical and approximate results are acceptable
246
247
### String Preprocessing
248
249
All fuzz functions support the `processor` parameter for string normalization:
250
251
```python
252
from rapidfuzz import fuzz, utils
253
254
# Without preprocessing - case sensitive
255
score = fuzz.ratio("Hello World", "HELLO WORLD")
256
print(score) # Lower score due to case differences
257
258
# With preprocessing - case insensitive, removes punctuation
259
score = fuzz.ratio("Hello World!", "HELLO WORLD",
260
processor=utils.default_process)
261
print(score) # 100.0
262
```
263
264
### Performance Optimization
265
266
Use `score_cutoff` to improve performance by early termination:
267
268
```python
269
from rapidfuzz import fuzz
270
271
# Only return scores >= 80, otherwise return 0
272
score = fuzz.ratio("test", "different", score_cutoff=80)
273
print(score) # 0 (below threshold)
274
275
score = fuzz.ratio("test", "testing", score_cutoff=80)
276
print(score) # 88.89 (above threshold)
277
```