or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

fuzzy-algorithms.mdindex.mdstring-processing.mdutilities.md

fuzzy-algorithms.mddocs/

0

# Fuzzy String Algorithms

1

2

Core fuzzy string comparison functions that implement various algorithms for measuring string similarity. All functions return integer scores from 0-100, where 100 indicates identical strings and 0 indicates no similarity.

3

4

## Capabilities

5

6

### Basic Ratio Matching

7

8

Standard string similarity using sequence matching algorithms.

9

10

```python { .api }

11

def ratio(s1: str, s2: str) -> int:

12

"""

13

Calculate similarity ratio between two strings.

14

15

Returns:

16

int: Similarity score 0-100

17

"""

18

```

19

20

**Usage Example:**

21

```python

22

from fuzzywuzzy import fuzz

23

24

score = fuzz.ratio("this is a test", "this is a test!")

25

print(score) # 97

26

27

score = fuzz.ratio("fuzzy wuzzy", "wuzzy fuzzy")

28

print(score) # 91

29

```

30

31

### Partial Ratio Matching

32

33

Finds the similarity of the most similar substring, useful when one string is contained within another.

34

35

```python { .api }

36

def partial_ratio(s1: str, s2: str) -> int:

37

"""

38

Return the ratio of the most similar substring as a number between 0 and 100.

39

40

Returns:

41

int: Partial similarity score 0-100

42

"""

43

```

44

45

**Usage Example:**

46

```python

47

from fuzzywuzzy import fuzz

48

49

score = fuzz.partial_ratio("this is a test", "this is a test!")

50

print(score) # 100

51

52

score = fuzz.partial_ratio("fuzzy wuzzy", "wuzzy")

53

print(score) # 100

54

```

55

56

### Token Sort Ratio

57

58

Compares strings after sorting tokens alphabetically, handling word order variations.

59

60

```python { .api }

61

def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

62

"""

63

Return similarity between 0 and 100 after sorting tokens.

64

65

Parameters:

66

s1: First string to compare

67

s2: Second string to compare

68

force_ascii: Force ASCII conversion (default True)

69

full_process: Apply full string processing (default True)

70

71

Returns:

72

int: Token sort similarity score 0-100

73

"""

74

```

75

76

**Usage Example:**

77

```python

78

from fuzzywuzzy import fuzz

79

80

score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

81

print(score) # 100

82

83

score = fuzz.token_sort_ratio("new york mets", "mets new york")

84

print(score) # 100

85

```

86

87

### Partial Token Sort Ratio

88

89

Combines partial ratio with token sorting for maximum flexibility in word order and substring matching.

90

91

```python { .api }

92

def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

93

"""

94

Return partial ratio of sorted tokens between 0 and 100.

95

96

Parameters:

97

s1: First string to compare

98

s2: Second string to compare

99

force_ascii: Force ASCII conversion (default True)

100

full_process: Apply full string processing (default True)

101

102

Returns:

103

int: Partial token sort similarity score 0-100

104

"""

105

```

106

107

### Token Set Ratio

108

109

Uses set theory to handle token intersections and differences, ideal for strings with repeated words.

110

111

```python { .api }

112

def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

113

"""

114

Return similarity using token set comparison between 0 and 100.

115

116

Compares the intersection and differences of token sets to handle

117

repeated words and partial matches effectively.

118

119

Parameters:

120

s1: First string to compare

121

s2: Second string to compare

122

force_ascii: Force ASCII conversion (default True)

123

full_process: Apply full string processing (default True)

124

125

Returns:

126

int: Token set similarity score 0-100

127

"""

128

```

129

130

**Usage Example:**

131

```python

132

from fuzzywuzzy import fuzz

133

134

score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")

135

print(score) # 100

136

137

score = fuzz.token_set_ratio("new york yankees", "yankees new york")

138

print(score) # 100

139

```

140

141

### Partial Token Set Ratio

142

143

Combines partial ratio with token set comparison for maximum robustness.

144

145

```python { .api }

146

def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

147

"""

148

Return partial ratio using token set comparison between 0 and 100.

149

150

Parameters:

151

s1: First string to compare

152

s2: Second string to compare

153

force_ascii: Force ASCII conversion (default True)

154

full_process: Apply full string processing (default True)

155

156

Returns:

157

int: Partial token set similarity score 0-100

158

"""

159

```

160

161

### Quick Ratio Functions

162

163

Fast ratio calculation with optional preprocessing.

164

165

```python { .api }

166

def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

167

"""

168

Quick ratio comparison between two strings.

169

170

Parameters:

171

s1: First string to compare

172

s2: Second string to compare

173

force_ascii: Allow only ASCII characters (default True)

174

full_process: Process inputs to avoid double processing (default True)

175

176

Returns:

177

int: Quick similarity score 0-100

178

"""

179

180

def UQRatio(s1: str, s2: str, full_process: bool = True) -> int:

181

"""

182

Unicode quick ratio - QRatio with force_ascii=False.

183

184

Parameters:

185

s1: First string to compare

186

s2: Second string to compare

187

full_process: Process inputs (default True)

188

189

Returns:

190

int: Unicode quick similarity score 0-100

191

"""

192

```

193

194

### Weighted Ratio Functions

195

196

Intelligent combination of multiple algorithms for optimal results.

197

198

```python { .api }

199

def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

200

"""

201

Return weighted similarity between 0 and 100 using multiple algorithms.

202

203

Automatically selects the best combination of ratio algorithms based on

204

string length differences and applies appropriate scaling factors.

205

206

Algorithm selection:

207

- Uses partial algorithms when one string is >1.5x longer than the other

208

- Applies 0.9 scaling for partial results, 0.6 for very long differences

209

- Uses token-based algorithms with 0.95 scaling

210

- Returns the highest score from all applicable algorithms

211

212

Parameters:

213

s1: First string to compare

214

s2: Second string to compare

215

force_ascii: Allow only ASCII characters (default True)

216

full_process: Process inputs (default True)

217

218

Returns:

219

int: Weighted similarity score 0-100

220

"""

221

222

def UWRatio(s1: str, s2: str, full_process: bool = True) -> int:

223

"""

224

Unicode weighted ratio - WRatio with force_ascii=False.

225

226

Parameters:

227

s1: First string to compare

228

s2: Second string to compare

229

full_process: Process inputs (default True)

230

231

Returns:

232

int: Unicode weighted similarity score 0-100

233

"""

234

```

235

236

**Usage Example:**

237

```python

238

from fuzzywuzzy import fuzz

239

240

# WRatio automatically selects the best algorithm

241

score = fuzz.WRatio("new york yankees", "yankees")

242

print(score) # Uses partial algorithms due to length difference

243

244

score = fuzz.WRatio("new york mets", "new york yankees")

245

print(score) # Uses token algorithms for similar-length strings

246

```