Tessl Tile for pypi/rapidfuzz@3.14.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

batch-processing.md distance-metrics.md fuzzy-matching.md index.md string-preprocessing.md

fuzzy-matching.mddocs/

0
# Fuzzy String Matching
1

2
High-level string similarity functions that provide intuitive percentage-based similarity scores (0-100) for different matching scenarios. These functions form the core of RapidFuzz's fuzzy matching capabilities.
3

4
## Capabilities
5

6
### Basic Ratio
7

8
Calculates the normalized similarity between two strings using edit distance.
9

10
```python { .api }
11
def ratio(
12
    s1: Sequence[Hashable], 
13
    s2: Sequence[Hashable], 
14
    *, 
15
    processor: Callable[[str], str] | None = None,
16
    score_cutoff: float | None = 0
17
) -> float
18
```
19

20
**Parameters:**
21
- `s1`: First string to compare
22
- `s2`: Second string to compare  
23
- `processor`: Optional preprocessing function (e.g., `utils.default_process`)
24
- `score_cutoff`: Minimum score threshold (0-100), returns 0 if below
25

26
**Returns:** Similarity score from 0-100 (100 = identical)
27

28
**Usage Example:**
29
```python
30
from rapidfuzz import fuzz
31

32
score = fuzz.ratio("this is a test", "this is a test!")
33
print(score)  # 96.55
34

35
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 
36
print(score)  # 90.91
37
```
38

39
### Partial Ratio
40

41
Finds the best matching substring within the longer string, useful when one string is contained within another.
42

43
```python { .api }
44
def partial_ratio(
45
    s1: Sequence[Hashable],
46
    s2: Sequence[Hashable], 
47
    *,
48
    processor: Callable[[str], str] | None = None,
49
    score_cutoff: float | None = 0
50
) -> float
51
```
52

53
**Returns:** Best substring similarity score from 0-100
54

55
**Usage Example:**
56
```python
57
from rapidfuzz import fuzz
58

59
# Perfect match when shorter string is contained in longer
60
score = fuzz.partial_ratio("this is a test", "this is a test!")
61
print(score)  # 100.0
62

63
score = fuzz.partial_ratio("needle", "haystack with needle in it")
64
print(score)  # 100.0
65
```
66

67
### Partial Ratio with Alignment
68

69
Same as partial_ratio but also returns alignment information showing where the match occurred.
70

71
```python { .api }
72
def partial_ratio_alignment(
73
    s1: Sequence[Hashable],
74
    s2: Sequence[Hashable],
75
    *,
76
    processor: Callable[[str], str] | None = None, 
77
    score_cutoff: float | None = 0
78
) -> ScoreAlignment | None
79
```
80

81
**Returns:** ScoreAlignment object with score and position information, or None if below cutoff
82

83
### Token Sort Ratio
84

85
Sorts the tokens (words) in both strings before comparing, useful for strings with different word orders.
86

87
```python { .api }
88
def token_sort_ratio(
89
    s1: Sequence[Hashable],
90
    s2: Sequence[Hashable],
91
    *,
92
    processor: Callable[[str], str] | None = None,
93
    score_cutoff: float | None = 0  
94
) -> float
95
```
96

97
**Usage Example:**
98
```python
99
from rapidfuzz import fuzz
100

101
# Different word order
102
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
103
print(score)  # 90.91
104

105
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 
106
print(score)  # 100.0
107
```
108

109
### Token Set Ratio
110

111
Compares strings using set-based operations on tokens, excellent for handling duplicates and subsets.
112

113
```python { .api }
114
def token_set_ratio(
115
    s1: Sequence[Hashable],
116
    s2: Sequence[Hashable],
117
    *,
118
    processor: Callable[[str], str] | None = None,
119
    score_cutoff: float | None = 0
120
) -> float
121
```
122

123
**Usage Example:**
124
```python
125
from rapidfuzz import fuzz
126

127
# Handles duplicates and subsets well
128
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
129
print(score)  # 100.0
130

131
score = fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")
132
print(score)  # 100.0 (subset)
133
```
134

135
### Token Ratio
136

137
Combines token_sort_ratio and token_set_ratio, choosing the higher score.
138

139
```python { .api }
140
def token_ratio(
141
    s1: Sequence[Hashable],
142
    s2: Sequence[Hashable],
143
    *,
144
    processor: Callable[[str], str] | None = None,
145
    score_cutoff: float | None = 0
146
) -> float
147
```
148

149
### Partial Token Functions
150

151
Partial versions of token-based ratios that find the best matching subsequence.
152

153
```python { .api }
154
def partial_token_sort_ratio(
155
    s1: Sequence[Hashable],
156
    s2: Sequence[Hashable], 
157
    *,
158
    processor: Callable[[str], str] | None = None,
159
    score_cutoff: float | None = 0
160
) -> float
161

162
def partial_token_set_ratio(
163
    s1: Sequence[Hashable],
164
    s2: Sequence[Hashable],
165
    *,
166
    processor: Callable[[str], str] | None = None,
167
    score_cutoff: float | None = 0
168
) -> float
169

170
def partial_token_ratio(
171
    s1: Sequence[Hashable],
172
    s2: Sequence[Hashable],
173
    *,
174
    processor: Callable[[str], str] | None = None,
175
    score_cutoff: float | None = 0
176
) -> float
177
```
178

179
### Weighted Ratio (WRatio)
180

181
Intelligent combination of multiple ratio algorithms that automatically selects the best approach based on string characteristics. This is the recommended general-purpose function.
182

183
```python { .api }
184
def WRatio(
185
    s1: Sequence[Hashable],
186
    s2: Sequence[Hashable],
187
    *,
188
    processor: Callable[[str], str] | None = None,
189
    score_cutoff: float | None = 0
190
) -> float
191
```
192

193
**Usage Example:**
194
```python
195
from rapidfuzz import fuzz, utils
196

197
score = fuzz.WRatio("this is a test", "this is a new test!!!")
198
print(score)  # 85.5
199

200
# With preprocessing to handle case and punctuation
201
score = fuzz.WRatio("this is a test", "this is a new test!!!", 
202
                   processor=utils.default_process)
203
print(score)  # 95.0
204

205
score = fuzz.WRatio("this is a word", "THIS IS A WORD", 
206
                   processor=utils.default_process)
207
print(score)  # 100.0
208
```
209

210
### Quick Ratio (QRatio)
211

212
Fast approximate matching algorithm that provides good performance with reasonable accuracy.
213

214
```python { .api }
215
def QRatio(
216
    s1: Sequence[Hashable],
217
    s2: Sequence[Hashable],
218
    *,
219
    processor: Callable[[str], str] | None = None,
220
    score_cutoff: float | None = 0
221
) -> float
222
```
223

224
**Usage Example:**
225
```python
226
from rapidfuzz import fuzz, utils
227

228
score = fuzz.QRatio("this is a test", "this is a new test!!!")
229
print(score)  # 80.0
230

231
score = fuzz.QRatio("this is a test", "this is a new test!!!", 
232
                   processor=utils.default_process)
233
print(score)  # 87.5
234
```
235

236
## Usage Patterns
237

238
### Choosing the Right Function
239

240
- **`WRatio`**: Best general-purpose choice, intelligently combines multiple algorithms
241
- **`ratio`**: Basic similarity when string length and order matter
242
- **`partial_ratio`**: When looking for substrings or one string contained in another  
243
- **`token_sort_ratio`**: When word order doesn't matter
244
- **`token_set_ratio`**: When handling duplicates or subset relationships
245
- **`QRatio`**: When performance is critical and approximate results are acceptable
246

247
### String Preprocessing
248

249
All fuzz functions support the `processor` parameter for string normalization:
250

251
```python
252
from rapidfuzz import fuzz, utils
253

254
# Without preprocessing - case sensitive
255
score = fuzz.ratio("Hello World", "HELLO WORLD")
256
print(score)  # Lower score due to case differences
257

258
# With preprocessing - case insensitive, removes punctuation
259
score = fuzz.ratio("Hello World!", "HELLO WORLD", 
260
                  processor=utils.default_process)
261
print(score)  # 100.0
262
```
263

264
### Performance Optimization
265

266
Use `score_cutoff` to improve performance by early termination:
267

268
```python
269
from rapidfuzz import fuzz
270

271
# Only return scores >= 80, otherwise return 0
272
score = fuzz.ratio("test", "different", score_cutoff=80)
273
print(score)  # 0 (below threshold)
274

275
score = fuzz.ratio("test", "testing", score_cutoff=80) 
276
print(score)  # 88.89 (above threshold)
277
```

Version

Tile

Files

fuzzy-matching.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

fuzzy-matching.mddocs/