rapid fuzzy string matching
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
High-level string similarity functions that provide intuitive percentage-based similarity scores (0-100) for different matching scenarios. These functions form the core of RapidFuzz's fuzzy matching capabilities.
Calculates the normalized similarity between two strings using edit distance.
def ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatParameters:
s1: First string to compares2: Second string to compareprocessor: Optional preprocessing function (e.g., utils.default_process)score_cutoff: Minimum score threshold (0-100), returns 0 if belowReturns: Similarity score from 0-100 (100 = identical)
Usage Example:
from rapidfuzz import fuzz
score = fuzz.ratio("this is a test", "this is a test!")
print(score) # 96.55
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(score) # 90.91Finds the best matching substring within the longer string, useful when one string is contained within another.
def partial_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatReturns: Best substring similarity score from 0-100
Usage Example:
from rapidfuzz import fuzz
# Perfect match when shorter string is contained in longer
score = fuzz.partial_ratio("this is a test", "this is a test!")
print(score) # 100.0
score = fuzz.partial_ratio("needle", "haystack with needle in it")
print(score) # 100.0Same as partial_ratio but also returns alignment information showing where the match occurred.
def partial_ratio_alignment(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> ScoreAlignment | NoneReturns: ScoreAlignment object with score and position information, or None if below cutoff
Sorts the tokens (words) in both strings before comparing, useful for strings with different word orders.
def token_sort_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatUsage Example:
from rapidfuzz import fuzz
# Different word order
score = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(score) # 90.91
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(score) # 100.0Compares strings using set-based operations on tokens, excellent for handling duplicates and subsets.
def token_set_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatUsage Example:
from rapidfuzz import fuzz
# Handles duplicates and subsets well
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
print(score) # 100.0
score = fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear")
print(score) # 100.0 (subset)Combines token_sort_ratio and token_set_ratio, choosing the higher score.
def token_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatPartial versions of token-based ratios that find the best matching subsequence.
def partial_token_sort_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> float
def partial_token_set_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> float
def partial_token_ratio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatIntelligent combination of multiple ratio algorithms that automatically selects the best approach based on string characteristics. This is the recommended general-purpose function.
def WRatio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatUsage Example:
from rapidfuzz import fuzz, utils
score = fuzz.WRatio("this is a test", "this is a new test!!!")
print(score) # 85.5
# With preprocessing to handle case and punctuation
score = fuzz.WRatio("this is a test", "this is a new test!!!",
processor=utils.default_process)
print(score) # 95.0
score = fuzz.WRatio("this is a word", "THIS IS A WORD",
processor=utils.default_process)
print(score) # 100.0Fast approximate matching algorithm that provides good performance with reasonable accuracy.
def QRatio(
s1: Sequence[Hashable],
s2: Sequence[Hashable],
*,
processor: Callable[[str], str] | None = None,
score_cutoff: float | None = 0
) -> floatUsage Example:
from rapidfuzz import fuzz, utils
score = fuzz.QRatio("this is a test", "this is a new test!!!")
print(score) # 80.0
score = fuzz.QRatio("this is a test", "this is a new test!!!",
processor=utils.default_process)
print(score) # 87.5WRatio: Best general-purpose choice, intelligently combines multiple algorithmsratio: Basic similarity when string length and order matterpartial_ratio: When looking for substrings or one string contained in anothertoken_sort_ratio: When word order doesn't mattertoken_set_ratio: When handling duplicates or subset relationshipsQRatio: When performance is critical and approximate results are acceptableAll fuzz functions support the processor parameter for string normalization:
from rapidfuzz import fuzz, utils
# Without preprocessing - case sensitive
score = fuzz.ratio("Hello World", "HELLO WORLD")
print(score) # Lower score due to case differences
# With preprocessing - case insensitive, removes punctuation
score = fuzz.ratio("Hello World!", "HELLO WORLD",
processor=utils.default_process)
print(score) # 100.0Use score_cutoff to improve performance by early termination:
from rapidfuzz import fuzz
# Only return scores >= 80, otherwise return 0
score = fuzz.ratio("test", "different", score_cutoff=80)
print(score) # 0 (below threshold)
score = fuzz.ratio("test", "testing", score_cutoff=80)
print(score) # 88.89 (above threshold)Install with Tessl CLI
npx tessl i tessl/pypi-rapidfuzz