or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdstring-processing.mdstring-similarity.mdstring-utilities.md

index.mddocs/

0

# TheFuzz

1

2

TheFuzz is a Python fuzzy string matching library that uses Levenshtein Distance algorithms to calculate similarities between text sequences. It provides multiple scoring strategies for approximate string comparison, from simple ratio calculations to advanced weighted algorithms that combine various matching techniques.

3

4

## Package Information

5

6

- **Package Name**: thefuzz

7

- **Language**: Python

8

- **Installation**: `pip install thefuzz`

9

- **Minimum Python Version**: 3.8+

10

11

## Core Imports

12

13

```python

14

from thefuzz import fuzz

15

from thefuzz import process

16

from thefuzz import utils

17

```

18

19

Individual function imports:

20

21

```python

22

from thefuzz.fuzz import ratio, WRatio, token_sort_ratio

23

from thefuzz.process import extractOne, extract, dedupe

24

from thefuzz.utils import full_process

25

```

26

27

## Basic Usage

28

29

```python

30

from thefuzz import fuzz

31

from thefuzz import process

32

33

# Basic string similarity scoring

34

ratio = fuzz.ratio("this is a test", "this is a test!")

35

print(ratio) # 97

36

37

# Token-based matching (handles word order differences)

38

token_ratio = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

39

print(token_ratio) # 100

40

41

# Weighted ratio (combines multiple algorithms)

42

weighted_ratio = fuzz.WRatio("this is a test", "this is a test!")

43

print(weighted_ratio) # 97

44

45

# Find best match from a list of choices

46

choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]

47

result = process.extractOne("new york jets", choices)

48

print(result) # ('New York Jets', 100)

49

50

# Get multiple best matches

51

results = process.extract("new york", choices, limit=2)

52

print(results) # [('New York Jets', 90), ('New York Giants', 90)]

53

```

54

55

## Architecture

56

57

TheFuzz builds on the high-performance rapidfuzz library while maintaining backward compatibility with the original fuzzywuzzy interface. The library is organized into three main modules:

58

59

- **fuzz**: Core string similarity scoring functions using various algorithms

60

- **process**: Functions for finding best matches in collections of strings

61

- **utils**: String preprocessing and normalization utilities

62

63

The library uses a consistent preprocessing pipeline that normalizes strings by removing non-alphanumeric characters, converting to lowercase, and optionally forcing ASCII encoding before applying fuzzy matching algorithms.

64

65

## Capabilities

66

67

### String Similarity Scoring

68

69

Core fuzzy string matching functions that calculate similarity ratios between two strings using different algorithms including basic ratio, partial matching, and token-based comparisons.

70

71

```python { .api }

72

def ratio(s1: str, s2: str) -> int: ...

73

def partial_ratio(s1: str, s2: str) -> int: ...

74

def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...

75

def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...

76

def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...

77

def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...

78

```

79

80

[String Similarity Scoring](./string-similarity.md)

81

82

### String Processing and Extraction

83

84

Functions for finding the best matches in collections of strings, including single and multiple match extraction, duplicate removal, and configurable scoring with custom processors.

85

86

```python { .api }

87

def extractOne(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...

88

def extract(query: str, choices, processor=None, scorer=None, limit: int = 5): ...

89

def extractBests(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0, limit: int = 5): ...

90

def extractWithoutOrder(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...

91

def dedupe(contains_dupes: list, threshold: int = 70, scorer=None): ...

92

```

93

94

[String Processing and Extraction](./string-processing.md)

95

96

### String Utilities

97

98

Utility functions for string preprocessing and normalization, including ASCII conversion and comprehensive text cleaning that removes non-alphanumeric characters and normalizes whitespace.

99

100

```python { .api }

101

def full_process(s: str, force_ascii: bool = False) -> str: ...

102

def ascii_only(s: str) -> str: ...

103

```

104

105

[String Utilities](./string-utilities.md)

106

107

## Types

108

109

```python { .api }

110

from typing import Callable, Union, Tuple, List, Dict, Any, Generator, TypeVar, Sequence

111

from collections.abc import Mapping

112

113

# Core type aliases from process module

114

ChoicesT = Union[Mapping[str, str], Sequence[str]]

115

T = TypeVar('T')

116

ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]

117

ScorerT = Callable[[str, str, bool, bool], int]

118

119

# Additional type aliases for better understanding

120

Scorer = Callable[[str, str], int]

121

Processor = Callable[[str], str]

122

Choice = Union[str, Tuple[Any, ...], Dict[str, Any]]

123

Choices = Union[List[Choice], Dict[Any, Choice]]

124

```