CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-rapidfuzz

rapid fuzzy string matching

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

fuzzy-matching.mddocs/

Fuzzy String Matching

High-level string similarity functions that provide intuitive percentage-based similarity scores (0-100) for different matching scenarios. These functions form the core of RapidFuzz's fuzzy matching capabilities.

Capabilities

Basic Ratio

Calculates the normalized similarity between two strings using edit distance.

def ratio(
    s1: Sequence[Hashable], 
    s2: Sequence[Hashable], 
    *, 
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Parameters:

  • s1: First string to compare
  • s2: Second string to compare
  • processor: Optional preprocessing function (e.g., utils.default_process)
  • score_cutoff: Minimum score threshold (0-100), returns 0 if below

Returns: Similarity score from 0-100 (100 = identical)

Usage Example:

from rapidfuzz import fuzz

score = fuzz.ratio("this is a test", "this is a test!")
print(score)  # 96.55

score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 
print(score)  # 90.91

Partial Ratio

Finds the best matching substring within the longer string, useful when one string is contained within another.

def partial_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable], 
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Returns: Best substring similarity score from 0-100

Usage Example:

from rapidfuzz import fuzz

# Perfect match when shorter string is contained in longer
score = fuzz.partial_ratio("this is a test", "this is a test!")
print(score)  # 100.0

score = fuzz.partial_ratio("needle", "haystack with needle in it")
print(score)  # 100.0

Partial Ratio with Alignment

Same as partial_ratio but also returns alignment information showing where the match occurred.

def partial_ratio_alignment(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None, 
    score_cutoff: float | None = 0
) -> ScoreAlignment | None

Returns: ScoreAlignment object with score and position information, or None if below cutoff

Token Sort Ratio

Sorts the tokens (words) in both strings before comparing, useful for strings with different word orders.

def token_sort_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0  
) -> float

Usage Example:

from rapidfuzz import fuzz

# Different word order
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(score)  # 90.91

score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 
print(score)  # 100.0

Token Set Ratio

Compares strings using set-based operations on tokens, excellent for handling duplicates and subsets.

def token_set_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Usage Example:

from rapidfuzz import fuzz

# Handles duplicates and subsets well
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
print(score)  # 100.0

score = fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")
print(score)  # 100.0 (subset)

Token Ratio

Combines token_sort_ratio and token_set_ratio, choosing the higher score.

def token_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Partial Token Functions

Partial versions of token-based ratios that find the best matching subsequence.

def partial_token_sort_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable], 
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

def partial_token_set_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

def partial_token_ratio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Weighted Ratio (WRatio)

Intelligent combination of multiple ratio algorithms that automatically selects the best approach based on string characteristics. This is the recommended general-purpose function.

def WRatio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Usage Example:

from rapidfuzz import fuzz, utils

score = fuzz.WRatio("this is a test", "this is a new test!!!")
print(score)  # 85.5

# With preprocessing to handle case and punctuation
score = fuzz.WRatio("this is a test", "this is a new test!!!", 
                   processor=utils.default_process)
print(score)  # 95.0

score = fuzz.WRatio("this is a word", "THIS IS A WORD", 
                   processor=utils.default_process)
print(score)  # 100.0

Quick Ratio (QRatio)

Fast approximate matching algorithm that provides good performance with reasonable accuracy.

def QRatio(
    s1: Sequence[Hashable],
    s2: Sequence[Hashable],
    *,
    processor: Callable[[str], str] | None = None,
    score_cutoff: float | None = 0
) -> float

Usage Example:

from rapidfuzz import fuzz, utils

score = fuzz.QRatio("this is a test", "this is a new test!!!")
print(score)  # 80.0

score = fuzz.QRatio("this is a test", "this is a new test!!!", 
                   processor=utils.default_process)
print(score)  # 87.5

Usage Patterns

Choosing the Right Function

  • WRatio: Best general-purpose choice, intelligently combines multiple algorithms
  • ratio: Basic similarity when string length and order matter
  • partial_ratio: When looking for substrings or one string contained in another
  • token_sort_ratio: When word order doesn't matter
  • token_set_ratio: When handling duplicates or subset relationships
  • QRatio: When performance is critical and approximate results are acceptable

String Preprocessing

All fuzz functions support the processor parameter for string normalization:

from rapidfuzz import fuzz, utils

# Without preprocessing - case sensitive
score = fuzz.ratio("Hello World", "HELLO WORLD")
print(score)  # Lower score due to case differences

# With preprocessing - case insensitive, removes punctuation
score = fuzz.ratio("Hello World!", "HELLO WORLD", 
                  processor=utils.default_process)
print(score)  # 100.0

Performance Optimization

Use score_cutoff to improve performance by early termination:

from rapidfuzz import fuzz

# Only return scores >= 80, otherwise return 0
score = fuzz.ratio("test", "different", score_cutoff=80)
print(score)  # 0 (below threshold)

score = fuzz.ratio("test", "testing", score_cutoff=80) 
print(score)  # 88.89 (above threshold)

Install with Tessl CLI

npx tessl i tessl/pypi-rapidfuzz

docs

batch-processing.md

distance-metrics.md

fuzzy-matching.md

index.md

string-preprocessing.md

tile.json