tessl/pypi-thefuzz

Fuzzy string matching library using Levenshtein Distance algorithms for approximate text comparison

Overview

Eval results

Files

String Similarity Scoring

Name: tessl/pypi-thefuzz
Author: tessl

Core fuzzy string matching functions that calculate similarity ratios between strings using various algorithms. All functions return integer scores from 0 (no match) to 100 (perfect match).

Capabilities

Basic Ratio Scoring

Simple string similarity using Levenshtein distance, providing a straightforward comparison between two strings without any preprocessing.

def ratio(s1: str, s2: str) -> int:
    """
    Calculate similarity ratio between two strings using Levenshtein distance.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        
    Returns:
        int: Similarity score from 0-100
    """

Partial Ratio Scoring

Finds the ratio of the most similar substring between two strings, useful when one string is contained within another or for partial matches.

def partial_ratio(s1: str, s2: str) -> int:
    """
    Calculate similarity ratio of the most similar substring.
    
    Args:
        s1: First string to compare  
        s2: Second string to compare
        
    Returns:
        int: Similarity score from 0-100 based on best substring match
    """

Token-Based Scoring

Advanced scoring functions that split strings into tokens (words) and apply different matching strategies to handle word order differences and common variations.

def token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Calculate similarity after sorting tokens alphabetically.
    
    Args:
        s1: First string to compare
        s2: Second string to compare  
        force_ascii: Convert to ASCII before processing
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100
    """

def partial_token_sort_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Calculate partial similarity after sorting tokens alphabetically.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        force_ascii: Convert to ASCII before processing  
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100 based on best partial match
    """

def token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Calculate similarity using token set comparison.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        force_ascii: Convert to ASCII before processing
        full_process: Apply full string preprocessing  
        
    Returns:
        int: Similarity score from 0-100
    """

def partial_token_set_ratio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Calculate partial similarity using token set comparison.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        force_ascii: Convert to ASCII before processing
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100 based on best partial match
    """

Advanced Combination Algorithms

Sophisticated scoring functions that combine multiple algorithms and apply intelligent weighting to provide the most accurate similarity scores.

def QRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Quick ratio comparison optimized for speed.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        force_ascii: Convert to ASCII before processing
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100
    """

def UQRatio(s1: str, s2: str, full_process: bool = True) -> int:
    """
    Unicode-aware quick ratio comparison.
    
    Args:
        s1: First string to compare
        s2: Second string to compare  
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100
    """

def WRatio(s1: str, s2: str, force_ascii: bool = True, full_process: bool = True) -> int:
    """
    Weighted ratio using multiple algorithms for best accuracy.
    
    Combines ratio, partial_ratio, token_sort_ratio, and token_set_ratio
    with intelligent weighting based on string length ratios.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        force_ascii: Convert to ASCII before processing
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100
    """

def UWRatio(s1: str, s2: str, full_process: bool = True) -> int:
    """
    Unicode-aware weighted ratio using multiple algorithms.
    
    Args:
        s1: First string to compare
        s2: Second string to compare
        full_process: Apply full string preprocessing
        
    Returns:
        int: Similarity score from 0-100
    """

Usage Examples

Basic Comparison

from thefuzz import fuzz

# Simple string comparison
score = fuzz.ratio("hello world", "hello world!")
print(score)  # 95

# Partial matching - useful for substring matching
score = fuzz.partial_ratio("this is a test", "is a")  
print(score)  # 100

Token-Based Matching

from thefuzz import fuzz

# Handle word order differences
score = fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(score)  # 100

# Token set matching - handles duplicates and order
score = fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
print(score)  # 100

Advanced Algorithms

from thefuzz import fuzz

# WRatio provides the most accurate results by combining algorithms
score = fuzz.WRatio("New York Mets vs Atlanta Braves", "Atlanta Braves vs New York Mets")
print(score)  # High score despite different word order

# Unicode support
score = fuzz.UWRatio("Café", "cafe")  # Handles accented characters

Install with Tessl CLI