or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdstring-processing.mdstring-similarity.mdstring-utilities.md

string-similarity.mddocs/

0

# String Similarity Scoring

1

2

Core fuzzy string matching functions that calculate similarity ratios between strings using various algorithms. All functions return integer scores from 0 (no match) to 100 (perfect match).

3

4

## Capabilities

5

6

### Basic Ratio Scoring

7

8

Simple string similarity using Levenshtein distance, providing a straightforward comparison between two strings without any preprocessing.

9

10

```python { .api }

11

def ratio(s1: str, s2: str) -> int:

12

"""

13

Calculate similarity ratio between two strings using Levenshtein distance.

14

15

Args:

16

s1: First string to compare

17

s2: Second string to compare

18

19

Returns:

20

int: Similarity score from 0-100

21

"""

22

```

23

24

### Partial Ratio Scoring

25

26

Finds the ratio of the most similar substring between two strings, useful when one string is contained within another or for partial matches.

27

28

```python { .api }

29

def partial_ratio(s1: str, s2: str) -> int:

30

"""

31

Calculate similarity ratio of the most similar substring.

32

33

Args:

34

s1: First string to compare

35

s2: Second string to compare

36

37

Returns:

38

int: Similarity score from 0-100 based on best substring match

39

"""

40

```

41

42

### Token-Based Scoring

43

44

Advanced scoring functions that split strings into tokens (words) and apply different matching strategies to handle word order differences and common variations.

45

46

```python { .api }

47

def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

48

"""

49

Calculate similarity after sorting tokens alphabetically.

50

51

Args:

52

s1: First string to compare

53

s2: Second string to compare

54

force_ascii: Convert to ASCII before processing

55

full_process: Apply full string preprocessing

56

57

Returns:

58

int: Similarity score from 0-100

59

"""

60

61

def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

62

"""

63

Calculate partial similarity after sorting tokens alphabetically.

64

65

Args:

66

s1: First string to compare

67

s2: Second string to compare

68

force_ascii: Convert to ASCII before processing

69

full_process: Apply full string preprocessing

70

71

Returns:

72

int: Similarity score from 0-100 based on best partial match

73

"""

74

75

def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

76

"""

77

Calculate similarity using token set comparison.

78

79

Args:

80

s1: First string to compare

81

s2: Second string to compare

82

force_ascii: Convert to ASCII before processing

83

full_process: Apply full string preprocessing

84

85

Returns:

86

int: Similarity score from 0-100

87

"""

88

89

def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

90

"""

91

Calculate partial similarity using token set comparison.

92

93

Args:

94

s1: First string to compare

95

s2: Second string to compare

96

force_ascii: Convert to ASCII before processing

97

full_process: Apply full string preprocessing

98

99

Returns:

100

int: Similarity score from 0-100 based on best partial match

101

"""

102

```

103

104

### Advanced Combination Algorithms

105

106

Sophisticated scoring functions that combine multiple algorithms and apply intelligent weighting to provide the most accurate similarity scores.

107

108

```python { .api }

109

def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

110

"""

111

Quick ratio comparison optimized for speed.

112

113

Args:

114

s1: First string to compare

115

s2: Second string to compare

116

force_ascii: Convert to ASCII before processing

117

full_process: Apply full string preprocessing

118

119

Returns:

120

int: Similarity score from 0-100

121

"""

122

123

def UQRatio(s1: str, s2: str, full_process: bool = True) -> int:

124

"""

125

Unicode-aware quick ratio comparison.

126

127

Args:

128

s1: First string to compare

129

s2: Second string to compare

130

full_process: Apply full string preprocessing

131

132

Returns:

133

int: Similarity score from 0-100

134

"""

135

136

def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:

137

"""

138

Weighted ratio using multiple algorithms for best accuracy.

139

140

Combines ratio, partial_ratio, token_sort_ratio, and token_set_ratio

141

with intelligent weighting based on string length ratios.

142

143

Args:

144

s1: First string to compare

145

s2: Second string to compare

146

force_ascii: Convert to ASCII before processing

147

full_process: Apply full string preprocessing

148

149

Returns:

150

int: Similarity score from 0-100

151

"""

152

153

def UWRatio(s1: str, s2: str, full_process: bool = True) -> int:

154

"""

155

Unicode-aware weighted ratio using multiple algorithms.

156

157

Args:

158

s1: First string to compare

159

s2: Second string to compare

160

full_process: Apply full string preprocessing

161

162

Returns:

163

int: Similarity score from 0-100

164

"""

165

```

166

167

## Usage Examples

168

169

### Basic Comparison

170

171

```python

172

from thefuzz import fuzz

173

174

# Simple string comparison

175

score = fuzz.ratio("hello world", "hello world!")

176

print(score) # 95

177

178

# Partial matching - useful for substring matching

179

score = fuzz.partial_ratio("this is a test", "is a")

180

print(score) # 100

181

```

182

183

### Token-Based Matching

184

185

```python

186

from thefuzz import fuzz

187

188

# Handle word order differences

189

score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

190

print(score) # 100

191

192

# Token set matching - handles duplicates and order

193

score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")

194

print(score) # 100

195

```

196

197

### Advanced Algorithms

198

199

```python

200

from thefuzz import fuzz

201

202

# WRatio provides the most accurate results by combining algorithms

203

score = fuzz.WRatio("New York Mets vs Atlanta Braves", "Atlanta Braves vs New York Mets")

204

print(score) # High score despite different word order

205

206

# Unicode support

207

score = fuzz.UWRatio("Café", "cafe") # Handles accented characters

208

```