0
# String Similarity Scoring
1
2
Core fuzzy string matching functions that calculate similarity ratios between strings using various algorithms. All functions return integer scores from 0 (no match) to 100 (perfect match).
3
4
## Capabilities
5
6
### Basic Ratio Scoring
7
8
Simple string similarity using Levenshtein distance, providing a straightforward comparison between two strings without any preprocessing.
9
10
```python { .api }
11
def ratio(s1: str, s2: str) -> int:
12
"""
13
Calculate similarity ratio between two strings using Levenshtein distance.
14
15
Args:
16
s1: First string to compare
17
s2: Second string to compare
18
19
Returns:
20
int: Similarity score from 0-100
21
"""
22
```
23
24
### Partial Ratio Scoring
25
26
Finds the ratio of the most similar substring between two strings, useful when one string is contained within another or for partial matches.
27
28
```python { .api }
29
def partial_ratio(s1: str, s2: str) -> int:
30
"""
31
Calculate similarity ratio of the most similar substring.
32
33
Args:
34
s1: First string to compare
35
s2: Second string to compare
36
37
Returns:
38
int: Similarity score from 0-100 based on best substring match
39
"""
40
```
41
42
### Token-Based Scoring
43
44
Advanced scoring functions that split strings into tokens (words) and apply different matching strategies to handle word order differences and common variations.
45
46
```python { .api }
47
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
48
"""
49
Calculate similarity after sorting tokens alphabetically.
50
51
Args:
52
s1: First string to compare
53
s2: Second string to compare
54
force_ascii: Convert to ASCII before processing
55
full_process: Apply full string preprocessing
56
57
Returns:
58
int: Similarity score from 0-100
59
"""
60
61
def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
62
"""
63
Calculate partial similarity after sorting tokens alphabetically.
64
65
Args:
66
s1: First string to compare
67
s2: Second string to compare
68
force_ascii: Convert to ASCII before processing
69
full_process: Apply full string preprocessing
70
71
Returns:
72
int: Similarity score from 0-100 based on best partial match
73
"""
74
75
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
76
"""
77
Calculate similarity using token set comparison.
78
79
Args:
80
s1: First string to compare
81
s2: Second string to compare
82
force_ascii: Convert to ASCII before processing
83
full_process: Apply full string preprocessing
84
85
Returns:
86
int: Similarity score from 0-100
87
"""
88
89
def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
90
"""
91
Calculate partial similarity using token set comparison.
92
93
Args:
94
s1: First string to compare
95
s2: Second string to compare
96
force_ascii: Convert to ASCII before processing
97
full_process: Apply full string preprocessing
98
99
Returns:
100
int: Similarity score from 0-100 based on best partial match
101
"""
102
```
103
104
### Advanced Combination Algorithms
105
106
Sophisticated scoring functions that combine multiple algorithms and apply intelligent weighting to provide the most accurate similarity scores.
107
108
```python { .api }
109
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
110
"""
111
Quick ratio comparison optimized for speed.
112
113
Args:
114
s1: First string to compare
115
s2: Second string to compare
116
force_ascii: Convert to ASCII before processing
117
full_process: Apply full string preprocessing
118
119
Returns:
120
int: Similarity score from 0-100
121
"""
122
123
def UQRatio(s1: str, s2: str, full_process: bool = True) -> int:
124
"""
125
Unicode-aware quick ratio comparison.
126
127
Args:
128
s1: First string to compare
129
s2: Second string to compare
130
full_process: Apply full string preprocessing
131
132
Returns:
133
int: Similarity score from 0-100
134
"""
135
136
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
137
"""
138
Weighted ratio using multiple algorithms for best accuracy.
139
140
Combines ratio, partial_ratio, token_sort_ratio, and token_set_ratio
141
with intelligent weighting based on string length ratios.
142
143
Args:
144
s1: First string to compare
145
s2: Second string to compare
146
force_ascii: Convert to ASCII before processing
147
full_process: Apply full string preprocessing
148
149
Returns:
150
int: Similarity score from 0-100
151
"""
152
153
def UWRatio(s1: str, s2: str, full_process: bool = True) -> int:
154
"""
155
Unicode-aware weighted ratio using multiple algorithms.
156
157
Args:
158
s1: First string to compare
159
s2: Second string to compare
160
full_process: Apply full string preprocessing
161
162
Returns:
163
int: Similarity score from 0-100
164
"""
165
```
166
167
## Usage Examples
168
169
### Basic Comparison
170
171
```python
172
from thefuzz import fuzz
173
174
# Simple string comparison
175
score = fuzz.ratio("hello world", "hello world!")
176
print(score) # 95
177
178
# Partial matching - useful for substring matching
179
score = fuzz.partial_ratio("this is a test", "is a")
180
print(score) # 100
181
```
182
183
### Token-Based Matching
184
185
```python
186
from thefuzz import fuzz
187
188
# Handle word order differences
189
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
190
print(score) # 100
191
192
# Token set matching - handles duplicates and order
193
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
194
print(score) # 100
195
```
196
197
### Advanced Algorithms
198
199
```python
200
from thefuzz import fuzz
201
202
# WRatio provides the most accurate results by combining algorithms
203
score = fuzz.WRatio("New York Mets vs Atlanta Braves", "Atlanta Braves vs New York Mets")
204
print(score) # High score despite different word order
205
206
# Unicode support
207
score = fuzz.UWRatio("Café", "cafe") # Handles accented characters
208
```