or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-rapidfuzz

rapid fuzzy string matching

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/rapidfuzz@3.14.x

To install, run

npx @tessl/cli install tessl/pypi-rapidfuzz@3.14.0

0

# RapidFuzz

1

2

A high-performance Python library for rapid fuzzy string matching that provides string similarity calculations using advanced algorithms including Levenshtein distance, Hamming distance, and Jaro-Winkler metrics. Built with C++ extensions for optimal performance, it offers a comprehensive set of string matching functions and efficient batch processing capabilities.

3

4

## Package Information

5

6

- **Package Name**: rapidfuzz

7

- **Language**: Python

8

- **Installation**: `pip install rapidfuzz`

9

- **Requires**: Python 3.10 or later

10

11

## Core Imports

12

13

```python

14

import rapidfuzz

15

```

16

17

Common patterns for specific functionality:

18

19

```python

20

from rapidfuzz import fuzz, process, distance, utils

21

```

22

23

Import specific functions:

24

25

```python

26

from rapidfuzz.fuzz import ratio, partial_ratio, partial_ratio_alignment, token_ratio, WRatio, QRatio

27

from rapidfuzz.process import extractOne, extract, extract_iter, cdist, cpdist

28

from rapidfuzz.distance import Levenshtein, Hamming, Jaro, JaroWinkler, DamerauLevenshtein

29

from rapidfuzz.distance import OSA, Indel, LCSseq, Prefix, Postfix

30

from rapidfuzz.utils import default_process

31

```

32

33

## Basic Usage

34

35

```python

36

from rapidfuzz import fuzz, process

37

38

# Basic string similarity

39

score = fuzz.ratio("this is a test", "this is a test!")

40

print(f"Similarity: {score}") # 96.55

41

42

# Partial matching (substring matching)

43

score = fuzz.partial_ratio("this is a test", "this is a test!")

44

print(f"Partial similarity: {score}") # 100.0

45

46

# Find best match from a list

47

choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]

48

match = process.extractOne("new york jets", choices)

49

print(f"Best match: {match}") # ('New York Jets', 76.92, 1)

50

51

# Find multiple matches

52

matches = process.extract("new york", choices, limit=2)

53

print(f"Top matches: {matches}")

54

# [('New York Jets', 76.92, 1), ('New York Giants', 64.29, 2)]

55

56

# With string preprocessing

57

from rapidfuzz import utils

58

match = process.extractOne("new york jets", choices, processor=utils.default_process)

59

print(f"Preprocessed match: {match}") # ('New York Jets', 100.0, 1)

60

```

61

62

## Architecture

63

64

RapidFuzz is organized into four main modules, each serving distinct purposes:

65

66

- **fuzz**: High-level similarity functions (ratio, partial_ratio, token_sort_ratio, WRatio, QRatio)

67

- **process**: Batch processing functions for comparing against lists of choices (extract, extractOne, cdist)

68

- **distance**: Low-level distance metrics and edit operations (Levenshtein, Hamming, Jaro, etc.)

69

- **utils**: String preprocessing utilities (default_process)

70

71

The library automatically selects optimized C++ implementations (AVX2, SSE2) when available, falling back to Python implementations for compatibility.

72

73

## Core Functions

74

75

### C++ Extension Support

76

77

```python { .api }

78

def get_include() -> str

79

```

80

81

Returns the directory containing RapidFuzz header files for building C++ extensions that use RapidFuzz functionality.

82

83

**Usage Example:**

84

```python

85

import rapidfuzz

86

87

include_dir = rapidfuzz.get_include()

88

print(f"Header files located at: {include_dir}")

89

90

# Use in setup.py for C++ extensions

91

from setuptools import Extension

92

ext = Extension(

93

'my_extension',

94

sources=['my_extension.cpp'],

95

include_dirs=[rapidfuzz.get_include()]

96

)

97

```

98

99

## Capabilities

100

101

### Fuzzy String Matching

102

103

High-level string similarity functions including basic ratios, partial matching, token-based comparisons, and weighted algorithms optimized for different use cases.

104

105

```python { .api }

106

def ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

107

def partial_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

108

def partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=0) -> ScoreAlignment | None: ...

109

def token_sort_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

110

def token_set_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

111

def token_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

112

def partial_token_sort_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

113

def partial_token_set_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

114

def partial_token_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

115

def WRatio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

116

def QRatio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...

117

```

118

119

[Fuzzy String Matching](./fuzzy-matching.md)

120

121

### Batch Processing

122

123

Efficient functions for comparing a query string against lists or collections of candidate strings, with support for finding single best matches, top-N matches, and distance matrices.

124

125

```python { .api }

126

def extractOne(query, choices, *, scorer=WRatio, processor=None, score_cutoff=None) -> tuple | None: ...

127

def extract(query, choices, *, scorer=WRatio, processor=None, limit=5, score_cutoff=None) -> list: ...

128

def extract_iter(query, choices, *, scorer=WRatio, processor=None, score_cutoff=None) -> Generator: ...

129

def cdist(queries, choices, *, scorer=ratio, processor=None, workers=1) -> numpy.ndarray: ...

130

def cpdist(queries, choices, *, scorer=ratio, processor=None, workers=1) -> numpy.ndarray: ...

131

```

132

133

[Batch Processing](./batch-processing.md)

134

135

### Distance Metrics

136

137

Low-level distance algorithms providing raw distance calculations, similarity scores, normalized metrics, and edit operation sequences for advanced string analysis.

138

139

```python { .api }

140

class Levenshtein:

141

@staticmethod

142

def distance(s1, s2, *, score_cutoff=None) -> int: ...

143

@staticmethod

144

def similarity(s1, s2, *, score_cutoff=None) -> int: ...

145

@staticmethod

146

def normalized_distance(s1, s2, *, score_cutoff=None) -> float: ...

147

@staticmethod

148

def normalized_similarity(s1, s2, *, score_cutoff=None) -> float: ...

149

```

150

151

[Distance Metrics](./distance-metrics.md)

152

153

### String Preprocessing

154

155

Utilities for normalizing and preprocessing strings before comparison, including case normalization, whitespace handling, and non-alphanumeric character removal.

156

157

```python { .api }

158

def default_process(sentence: str) -> str: ...

159

```

160

161

[String Preprocessing](./string-preprocessing.md)

162

163

## Types

164

165

```python { .api }

166

from typing import Sequence, Hashable, Callable, Iterable, Mapping, Any

167

from collections.abc import Generator

168

import numpy

169

170

# Core types for string inputs

171

StringType = Sequence[Hashable] # Accepts strings, lists, tuples of hashable items

172

173

# Edit operation types

174

class Editop:

175

def __init__(self, tag: str, src_pos: int, dest_pos: int) -> None: ...

176

tag: str # 'replace', 'delete', 'insert'

177

src_pos: int # Position in source string

178

dest_pos: int # Position in destination string

179

180

class Editops:

181

# List-like container of Editop objects

182

def __init__(self, editops: list | None = None, src_len: int = 0, dest_len: int = 0) -> None: ...

183

def __len__(self) -> int: ...

184

def __getitem__(self, index: int) -> Editop: ...

185

def as_opcodes(self) -> Opcodes: ...

186

def as_matching_blocks(self) -> list[MatchingBlock]: ...

187

def as_list(self) -> list[tuple[str, int, int]]: ...

188

def copy(self) -> Editops: ...

189

def inverse(self) -> Editops: ...

190

def remove_subsequence(self, subsequence: Editops) -> Editops: ...

191

def apply(self, source_string: str | bytes, destination_string: str | bytes) -> str: ...

192

@classmethod

193

def from_opcodes(cls, opcodes: Opcodes) -> Editops: ...

194

src_len: int

195

dest_len: int

196

197

class Opcode:

198

def __init__(self, tag: str, a1: int, a2: int, b1: int, b2: int) -> None: ...

199

tag: str # 'replace', 'delete', 'insert', 'equal'

200

a1: int # Start position in first string

201

a2: int # End position in first string

202

b1: int # Start position in second string

203

b2: int # End position in second string

204

205

class Opcodes:

206

# List-like container of Opcode objects

207

def __init__(self, opcodes: list | None = None, src_len: int = 0, dest_len: int = 0) -> None: ...

208

def __len__(self) -> int: ...

209

def __getitem__(self, index: int) -> Opcode: ...

210

def as_editops(self) -> Editops: ...

211

def as_matching_blocks(self) -> list[MatchingBlock]: ...

212

def as_list(self) -> list[tuple[str, int, int, int, int]]: ...

213

def copy(self) -> Opcodes: ...

214

def inverse(self) -> Opcodes: ...

215

def apply(self, source_string: str | bytes, destination_string: str | bytes) -> str: ...

216

@classmethod

217

def from_editops(cls, editops: Editops) -> Opcodes: ...

218

src_len: int

219

dest_len: int

220

221

class MatchingBlock:

222

def __init__(self, a: int, b: int, size: int) -> None: ...

223

a: int # Start position in first string

224

b: int # Start position in second string

225

size: int # Length of the matching block

226

227

class ScoreAlignment:

228

def __init__(self, score: float, src_start: int, src_end: int, dest_start: int, dest_end: int) -> None: ...

229

score: float # Similarity/distance score

230

src_start: int # Start position in source

231

src_end: int # End position in source

232

dest_start: int # Start position in destination

233

dest_end: int # End position in destination

234

235

# Process function return types

236

ExtractResult = tuple[str, float, int] # (match, score, index)

237

ExtractResultMapping = tuple[str, float, Any] # (match, score, key)

238

```