or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.mdstring-processing.mdstring-similarity.mdstring-utilities.md
tile.json

tessl/pypi-thefuzz

Fuzzy string matching library using Levenshtein Distance algorithms for approximate text comparison

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/thefuzz@0.22.x

To install, run

npx @tessl/cli install tessl/pypi-thefuzz@0.22.0

index.mddocs/

TheFuzz

TheFuzz is a Python fuzzy string matching library that uses Levenshtein Distance algorithms to calculate similarities between text sequences. It provides multiple scoring strategies for approximate string comparison, from simple ratio calculations to advanced weighted algorithms that combine various matching techniques.

Package Information

  • Package Name: thefuzz
  • Language: Python
  • Installation: pip install thefuzz
  • Minimum Python Version: 3.8+

Core Imports

from thefuzz import fuzz
from thefuzz import process
from thefuzz import utils

Individual function imports:

from thefuzz.fuzz import ratio, WRatio, token_sort_ratio
from thefuzz.process import extractOne, extract, dedupe
from thefuzz.utils import full_process

Basic Usage

from thefuzz import fuzz
from thefuzz import process

# Basic string similarity scoring
ratio = fuzz.ratio("this is a test", "this is a test!")
print(ratio)  # 97

# Token-based matching (handles word order differences)
token_ratio = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(token_ratio)  # 100

# Weighted ratio (combines multiple algorithms)
weighted_ratio = fuzz.WRatio("this is a test", "this is a test!")
print(weighted_ratio)  # 97

# Find best match from a list of choices
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
result = process.extractOne("new york jets", choices)
print(result)  # ('New York Jets', 100)

# Get multiple best matches
results = process.extract("new york", choices, limit=2)
print(results)  # [('New York Jets', 90), ('New York Giants', 90)]

Architecture

TheFuzz builds on the high-performance rapidfuzz library while maintaining backward compatibility with the original fuzzywuzzy interface. The library is organized into three main modules:

  • fuzz: Core string similarity scoring functions using various algorithms
  • process: Functions for finding best matches in collections of strings
  • utils: String preprocessing and normalization utilities

The library uses a consistent preprocessing pipeline that normalizes strings by removing non-alphanumeric characters, converting to lowercase, and optionally forcing ASCII encoding before applying fuzzy matching algorithms.

Capabilities

String Similarity Scoring

Core fuzzy string matching functions that calculate similarity ratios between two strings using different algorithms including basic ratio, partial matching, and token-based comparisons.

def ratio(s1: str, s2: str) -> int: ...
def partial_ratio(s1: str, s2: str) -> int: ...
def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...
def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int: ...

String Similarity Scoring

String Processing and Extraction

Functions for finding the best matches in collections of strings, including single and multiple match extraction, duplicate removal, and configurable scoring with custom processors.

def extractOne(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
def extract(query: str, choices, processor=None, scorer=None, limit: int = 5): ...
def extractBests(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0, limit: int = 5): ...
def extractWithoutOrder(query: str, choices, processor=None, scorer=None, score_cutoff: int = 0): ...
def dedupe(contains_dupes: list, threshold: int = 70, scorer=None): ...

String Processing and Extraction

String Utilities

Utility functions for string preprocessing and normalization, including ASCII conversion and comprehensive text cleaning that removes non-alphanumeric characters and normalizes whitespace.

def full_process(s: str, force_ascii: bool = False) -> str: ...
def ascii_only(s: str) -> str: ...

String Utilities

Types

from typing import Callable, Union, Tuple, List, Dict, Any, Generator, TypeVar, Sequence
from collections.abc import Mapping

# Core type aliases from process module
ChoicesT = Union[Mapping[str, str], Sequence[str]]
T = TypeVar('T')
ProcessorT = Union[Callable[[str, bool], str], Callable[[Any], Any]]
ScorerT = Callable[[str, str, bool, bool], int]

# Additional type aliases for better understanding
Scorer = Callable[[str, str], int]
Processor = Callable[[str], str]
Choice = Union[str, Tuple[Any, ...], Dict[str, Any]]
Choices = Union[List[Choice], Dict[Any, Choice]]