Tessl Tile for pypi/rapidfuzz@3.14.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-rapidfuzz

rapid fuzzy string matching

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/rapidfuzz@3.14.x

To install, run

npx @tessl/cli install tessl/pypi-rapidfuzz@3.14.0

0
# RapidFuzz
1

2
A high-performance Python library for rapid fuzzy string matching that provides string similarity calculations using advanced algorithms including Levenshtein distance, Hamming distance, and Jaro-Winkler metrics. Built with C++ extensions for optimal performance, it offers a comprehensive set of string matching functions and efficient batch processing capabilities.
3

4
## Package Information
5

6
- **Package Name**: rapidfuzz
7
- **Language**: Python
8
- **Installation**: `pip install rapidfuzz`
9
- **Requires**: Python 3.10 or later
10

11
## Core Imports
12

13
```python
14
import rapidfuzz
15
```
16

17
Common patterns for specific functionality:
18

19
```python
20
from rapidfuzz import fuzz, process, distance, utils
21
```
22

23
Import specific functions:
24

25
```python  
26
from rapidfuzz.fuzz import ratio, partial_ratio, partial_ratio_alignment, token_ratio, WRatio, QRatio
27
from rapidfuzz.process import extractOne, extract, extract_iter, cdist, cpdist
28
from rapidfuzz.distance import Levenshtein, Hamming, Jaro, JaroWinkler, DamerauLevenshtein
29
from rapidfuzz.distance import OSA, Indel, LCSseq, Prefix, Postfix
30
from rapidfuzz.utils import default_process
31
```
32

33
## Basic Usage
34

35
```python
36
from rapidfuzz import fuzz, process
37

38
# Basic string similarity
39
score = fuzz.ratio("this is a test", "this is a test!")
40
print(f"Similarity: {score}")  # 96.55
41

42
# Partial matching (substring matching)
43
score = fuzz.partial_ratio("this is a test", "this is a test!")
44
print(f"Partial similarity: {score}")  # 100.0
45

46
# Find best match from a list
47
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
48
match = process.extractOne("new york jets", choices)
49
print(f"Best match: {match}")  # ('New York Jets', 76.92, 1)
50

51
# Find multiple matches
52
matches = process.extract("new york", choices, limit=2)
53
print(f"Top matches: {matches}")
54
# [('New York Jets', 76.92, 1), ('New York Giants', 64.29, 2)]
55

56
# With string preprocessing
57
from rapidfuzz import utils
58
match = process.extractOne("new york jets", choices, processor=utils.default_process)
59
print(f"Preprocessed match: {match}")  # ('New York Jets', 100.0, 1)
60
```
61

62
## Architecture
63

64
RapidFuzz is organized into four main modules, each serving distinct purposes:
65

66
- **fuzz**: High-level similarity functions (ratio, partial_ratio, token_sort_ratio, WRatio, QRatio)
67
- **process**: Batch processing functions for comparing against lists of choices (extract, extractOne, cdist)
68
- **distance**: Low-level distance metrics and edit operations (Levenshtein, Hamming, Jaro, etc.)
69
- **utils**: String preprocessing utilities (default_process)
70

71
The library automatically selects optimized C++ implementations (AVX2, SSE2) when available, falling back to Python implementations for compatibility.
72

73
## Core Functions
74

75
### C++ Extension Support
76

77
```python { .api }
78
def get_include() -> str
79
```
80

81
Returns the directory containing RapidFuzz header files for building C++ extensions that use RapidFuzz functionality.
82

83
**Usage Example:**
84
```python
85
import rapidfuzz
86

87
include_dir = rapidfuzz.get_include()
88
print(f"Header files located at: {include_dir}")
89

90
# Use in setup.py for C++ extensions
91
from setuptools import Extension
92
ext = Extension(
93
    'my_extension',
94
    sources=['my_extension.cpp'],
95
    include_dirs=[rapidfuzz.get_include()]
96
)
97
```
98

99
## Capabilities
100

101
### Fuzzy String Matching
102

103
High-level string similarity functions including basic ratios, partial matching, token-based comparisons, and weighted algorithms optimized for different use cases.
104

105
```python { .api }
106
def ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
107
def partial_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
108
def partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=0) -> ScoreAlignment | None: ...
109
def token_sort_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
110
def token_set_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
111
def token_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
112
def partial_token_sort_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
113
def partial_token_set_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
114
def partial_token_ratio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
115
def WRatio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
116
def QRatio(s1, s2, *, processor=None, score_cutoff=0) -> float: ...
117
```
118

119
[Fuzzy String Matching](./fuzzy-matching.md)
120

121
### Batch Processing
122

123
Efficient functions for comparing a query string against lists or collections of candidate strings, with support for finding single best matches, top-N matches, and distance matrices.
124

125
```python { .api }
126
def extractOne(query, choices, *, scorer=WRatio, processor=None, score_cutoff=None) -> tuple | None: ...
127
def extract(query, choices, *, scorer=WRatio, processor=None, limit=5, score_cutoff=None) -> list: ...
128
def extract_iter(query, choices, *, scorer=WRatio, processor=None, score_cutoff=None) -> Generator: ...
129
def cdist(queries, choices, *, scorer=ratio, processor=None, workers=1) -> numpy.ndarray: ...
130
def cpdist(queries, choices, *, scorer=ratio, processor=None, workers=1) -> numpy.ndarray: ...
131
```
132

133
[Batch Processing](./batch-processing.md)
134

135
### Distance Metrics
136

137
Low-level distance algorithms providing raw distance calculations, similarity scores, normalized metrics, and edit operation sequences for advanced string analysis.
138

139
```python { .api }
140
class Levenshtein:
141
    @staticmethod
142
    def distance(s1, s2, *, score_cutoff=None) -> int: ...
143
    @staticmethod
144
    def similarity(s1, s2, *, score_cutoff=None) -> int: ...
145
    @staticmethod
146
    def normalized_distance(s1, s2, *, score_cutoff=None) -> float: ...
147
    @staticmethod
148
    def normalized_similarity(s1, s2, *, score_cutoff=None) -> float: ...
149
```
150

151
[Distance Metrics](./distance-metrics.md)
152

153
### String Preprocessing
154

155
Utilities for normalizing and preprocessing strings before comparison, including case normalization, whitespace handling, and non-alphanumeric character removal.
156

157
```python { .api }
158
def default_process(sentence: str) -> str: ...
159
```
160

161
[String Preprocessing](./string-preprocessing.md)
162

163
## Types
164

165
```python { .api }
166
from typing import Sequence, Hashable, Callable, Iterable, Mapping, Any
167
from collections.abc import Generator
168
import numpy
169

170
# Core types for string inputs
171
StringType = Sequence[Hashable]  # Accepts strings, lists, tuples of hashable items
172

173
# Edit operation types  
174
class Editop:
175
    def __init__(self, tag: str, src_pos: int, dest_pos: int) -> None: ...
176
    tag: str        # 'replace', 'delete', 'insert'  
177
    src_pos: int    # Position in source string
178
    dest_pos: int   # Position in destination string
179

180
class Editops:
181
    # List-like container of Editop objects
182
    def __init__(self, editops: list | None = None, src_len: int = 0, dest_len: int = 0) -> None: ...
183
    def __len__(self) -> int: ...
184
    def __getitem__(self, index: int) -> Editop: ...
185
    def as_opcodes(self) -> Opcodes: ...
186
    def as_matching_blocks(self) -> list[MatchingBlock]: ...
187
    def as_list(self) -> list[tuple[str, int, int]]: ...
188
    def copy(self) -> Editops: ...
189
    def inverse(self) -> Editops: ...
190
    def remove_subsequence(self, subsequence: Editops) -> Editops: ...
191
    def apply(self, source_string: str | bytes, destination_string: str | bytes) -> str: ...
192
    @classmethod
193
    def from_opcodes(cls, opcodes: Opcodes) -> Editops: ...
194
    src_len: int
195
    dest_len: int
196

197
class Opcode:
198
    def __init__(self, tag: str, a1: int, a2: int, b1: int, b2: int) -> None: ...
199
    tag: str     # 'replace', 'delete', 'insert', 'equal'
200
    a1: int      # Start position in first string
201
    a2: int      # End position in first string  
202
    b1: int      # Start position in second string
203
    b2: int      # End position in second string
204

205
class Opcodes:
206
    # List-like container of Opcode objects
207
    def __init__(self, opcodes: list | None = None, src_len: int = 0, dest_len: int = 0) -> None: ...
208
    def __len__(self) -> int: ...
209
    def __getitem__(self, index: int) -> Opcode: ...
210
    def as_editops(self) -> Editops: ...
211
    def as_matching_blocks(self) -> list[MatchingBlock]: ...
212
    def as_list(self) -> list[tuple[str, int, int, int, int]]: ...
213
    def copy(self) -> Opcodes: ...
214
    def inverse(self) -> Opcodes: ...
215
    def apply(self, source_string: str | bytes, destination_string: str | bytes) -> str: ...
216
    @classmethod
217
    def from_editops(cls, editops: Editops) -> Opcodes: ...
218
    src_len: int
219
    dest_len: int
220

221
class MatchingBlock:
222
    def __init__(self, a: int, b: int, size: int) -> None: ...
223
    a: int          # Start position in first string
224
    b: int          # Start position in second string
225
    size: int       # Length of the matching block
226

227
class ScoreAlignment:
228
    def __init__(self, score: float, src_start: int, src_end: int, dest_start: int, dest_end: int) -> None: ...
229
    score: float         # Similarity/distance score
230
    src_start: int       # Start position in source
231
    src_end: int         # End position in source  
232
    dest_start: int      # Start position in destination
233
    dest_end: int        # End position in destination
234

235
# Process function return types
236
ExtractResult = tuple[str, float, int]          # (match, score, index)
237
ExtractResultMapping = tuple[str, float, Any]   # (match, score, key)
238
```