or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch-processing.mddistance-metrics.mdfuzzy-matching.mdindex.mdstring-preprocessing.md

fuzzy-matching.mddocs/

0

# Fuzzy String Matching

1

2

High-level string similarity functions that provide intuitive percentage-based similarity scores (0-100) for different matching scenarios. These functions form the core of RapidFuzz's fuzzy matching capabilities.

3

4

## Capabilities

5

6

### Basic Ratio

7

8

Calculates the normalized similarity between two strings using edit distance.

9

10

```python { .api }

11

def ratio(

12

s1: Sequence[Hashable],

13

s2: Sequence[Hashable],

14

*,

15

processor: Callable[[str], str] | None = None,

16

score_cutoff: float | None = 0

17

) -> float

18

```

19

20

**Parameters:**

21

- `s1`: First string to compare

22

- `s2`: Second string to compare

23

- `processor`: Optional preprocessing function (e.g., `utils.default_process`)

24

- `score_cutoff`: Minimum score threshold (0-100), returns 0 if below

25

26

**Returns:** Similarity score from 0-100 (100 = identical)

27

28

**Usage Example:**

29

```python

30

from rapidfuzz import fuzz

31

32

score = fuzz.ratio("this is a test", "this is a test!")

33

print(score) # 96.55

34

35

score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

36

print(score) # 90.91

37

```

38

39

### Partial Ratio

40

41

Finds the best matching substring within the longer string, useful when one string is contained within another.

42

43

```python { .api }

44

def partial_ratio(

45

s1: Sequence[Hashable],

46

s2: Sequence[Hashable],

47

*,

48

processor: Callable[[str], str] | None = None,

49

score_cutoff: float | None = 0

50

) -> float

51

```

52

53

**Returns:** Best substring similarity score from 0-100

54

55

**Usage Example:**

56

```python

57

from rapidfuzz import fuzz

58

59

# Perfect match when shorter string is contained in longer

60

score = fuzz.partial_ratio("this is a test", "this is a test!")

61

print(score) # 100.0

62

63

score = fuzz.partial_ratio("needle", "haystack with needle in it")

64

print(score) # 100.0

65

```

66

67

### Partial Ratio with Alignment

68

69

Same as partial_ratio but also returns alignment information showing where the match occurred.

70

71

```python { .api }

72

def partial_ratio_alignment(

73

s1: Sequence[Hashable],

74

s2: Sequence[Hashable],

75

*,

76

processor: Callable[[str], str] | None = None,

77

score_cutoff: float | None = 0

78

) -> ScoreAlignment | None

79

```

80

81

**Returns:** ScoreAlignment object with score and position information, or None if below cutoff

82

83

### Token Sort Ratio

84

85

Sorts the tokens (words) in both strings before comparing, useful for strings with different word orders.

86

87

```python { .api }

88

def token_sort_ratio(

89

s1: Sequence[Hashable],

90

s2: Sequence[Hashable],

91

*,

92

processor: Callable[[str], str] | None = None,

93

score_cutoff: float | None = 0

94

) -> float

95

```

96

97

**Usage Example:**

98

```python

99

from rapidfuzz import fuzz

100

101

# Different word order

102

score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

103

print(score) # 90.91

104

105

score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

106

print(score) # 100.0

107

```

108

109

### Token Set Ratio

110

111

Compares strings using set-based operations on tokens, excellent for handling duplicates and subsets.

112

113

```python { .api }

114

def token_set_ratio(

115

s1: Sequence[Hashable],

116

s2: Sequence[Hashable],

117

*,

118

processor: Callable[[str], str] | None = None,

119

score_cutoff: float | None = 0

120

) -> float

121

```

122

123

**Usage Example:**

124

```python

125

from rapidfuzz import fuzz

126

127

# Handles duplicates and subsets well

128

score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")

129

print(score) # 100.0

130

131

score = fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")

132

print(score) # 100.0 (subset)

133

```

134

135

### Token Ratio

136

137

Combines token_sort_ratio and token_set_ratio, choosing the higher score.

138

139

```python { .api }

140

def token_ratio(

141

s1: Sequence[Hashable],

142

s2: Sequence[Hashable],

143

*,

144

processor: Callable[[str], str] | None = None,

145

score_cutoff: float | None = 0

146

) -> float

147

```

148

149

### Partial Token Functions

150

151

Partial versions of token-based ratios that find the best matching subsequence.

152

153

```python { .api }

154

def partial_token_sort_ratio(

155

s1: Sequence[Hashable],

156

s2: Sequence[Hashable],

157

*,

158

processor: Callable[[str], str] | None = None,

159

score_cutoff: float | None = 0

160

) -> float

161

162

def partial_token_set_ratio(

163

s1: Sequence[Hashable],

164

s2: Sequence[Hashable],

165

*,

166

processor: Callable[[str], str] | None = None,

167

score_cutoff: float | None = 0

168

) -> float

169

170

def partial_token_ratio(

171

s1: Sequence[Hashable],

172

s2: Sequence[Hashable],

173

*,

174

processor: Callable[[str], str] | None = None,

175

score_cutoff: float | None = 0

176

) -> float

177

```

178

179

### Weighted Ratio (WRatio)

180

181

Intelligent combination of multiple ratio algorithms that automatically selects the best approach based on string characteristics. This is the recommended general-purpose function.

182

183

```python { .api }

184

def WRatio(

185

s1: Sequence[Hashable],

186

s2: Sequence[Hashable],

187

*,

188

processor: Callable[[str], str] | None = None,

189

score_cutoff: float | None = 0

190

) -> float

191

```

192

193

**Usage Example:**

194

```python

195

from rapidfuzz import fuzz, utils

196

197

score = fuzz.WRatio("this is a test", "this is a new test!!!")

198

print(score) # 85.5

199

200

# With preprocessing to handle case and punctuation

201

score = fuzz.WRatio("this is a test", "this is a new test!!!",

202

processor=utils.default_process)

203

print(score) # 95.0

204

205

score = fuzz.WRatio("this is a word", "THIS IS A WORD",

206

processor=utils.default_process)

207

print(score) # 100.0

208

```

209

210

### Quick Ratio (QRatio)

211

212

Fast approximate matching algorithm that provides good performance with reasonable accuracy.

213

214

```python { .api }

215

def QRatio(

216

s1: Sequence[Hashable],

217

s2: Sequence[Hashable],

218

*,

219

processor: Callable[[str], str] | None = None,

220

score_cutoff: float | None = 0

221

) -> float

222

```

223

224

**Usage Example:**

225

```python

226

from rapidfuzz import fuzz, utils

227

228

score = fuzz.QRatio("this is a test", "this is a new test!!!")

229

print(score) # 80.0

230

231

score = fuzz.QRatio("this is a test", "this is a new test!!!",

232

processor=utils.default_process)

233

print(score) # 87.5

234

```

235

236

## Usage Patterns

237

238

### Choosing the Right Function

239

240

- **`WRatio`**: Best general-purpose choice, intelligently combines multiple algorithms

241

- **`ratio`**: Basic similarity when string length and order matter

242

- **`partial_ratio`**: When looking for substrings or one string contained in another

243

- **`token_sort_ratio`**: When word order doesn't matter

244

- **`token_set_ratio`**: When handling duplicates or subset relationships

245

- **`QRatio`**: When performance is critical and approximate results are acceptable

246

247

### String Preprocessing

248

249

All fuzz functions support the `processor` parameter for string normalization:

250

251

```python

252

from rapidfuzz import fuzz, utils

253

254

# Without preprocessing - case sensitive

255

score = fuzz.ratio("Hello World", "HELLO WORLD")

256

print(score) # Lower score due to case differences

257

258

# With preprocessing - case insensitive, removes punctuation

259

score = fuzz.ratio("Hello World!", "HELLO WORLD",

260

processor=utils.default_process)

261

print(score) # 100.0

262

```

263

264

### Performance Optimization

265

266

Use `score_cutoff` to improve performance by early termination:

267

268

```python

269

from rapidfuzz import fuzz

270

271

# Only return scores >= 80, otherwise return 0

272

score = fuzz.ratio("test", "different", score_cutoff=80)

273

print(score) # 0 (below threshold)

274

275

score = fuzz.ratio("test", "testing", score_cutoff=80)

276

print(score) # 88.89 (above threshold)

277

```