or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/pypi-python-levenshtein

Python compatibility wrapper for computing string edit distances and similarities using fast Levenshtein algorithms.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/python-levenshtein@0.27.x

To install, run

npx @tessl/cli install tessl/pypi-python-levenshtein@0.27.0

index.mddocs/

python-Levenshtein

A Python compatibility wrapper providing fast computation of string edit distances and similarities through the Levenshtein C extension library. This package enables legacy code to continue working with the old python-Levenshtein package name while accessing the modern Levenshtein package functionality.

Package Information

  • Package Name: python-Levenshtein
  • Language: Python
  • Installation: pip install python-Levenshtein
  • Type: Compatibility wrapper package

Core Imports

import Levenshtein

Import specific functions:

from Levenshtein import distance, ratio, editops

Basic Usage

import Levenshtein

# Calculate edit distance between strings
distance = Levenshtein.distance("hello", "world")
print(distance)  # 4

# Calculate similarity ratio (0.0 to 1.0)
similarity = Levenshtein.ratio("hello", "help")
print(similarity)  # 0.75

# Get edit operations needed to transform one string to another
ops = Levenshtein.editops("kitten", "sitting")
print(ops)  # [('substitute', 0, 0), ('substitute', 4, 4), ('insert', 6, 6)]

Architecture

The python-Levenshtein package serves as a compatibility wrapper that maintains backward compatibility for legacy code while providing access to modern string similarity functionality:

  • Wrapper Design: This package contains no implementation code itself - it simply declares a dependency on Levenshtein==0.27.1
  • Compatibility Layer: Enables legacy code using the old python-Levenshtein package name to continue working unchanged
  • Underlying Implementation: All functionality is provided by the high-performance C extension in the Levenshtein package
  • Import Pattern: Functions are accessed through import Levenshtein regardless of whether installed via python-Levenshtein or Levenshtein

This design ensures seamless migration paths while leveraging the optimized C implementation for maximum performance in string similarity computations.

Capabilities

Core Distance Functions

Basic string distance calculations using various algorithms.

def distance(s1: str, s2: str) -> int:
    """
    Compute the Levenshtein edit distance between two strings.
    
    Args:
        s1 (str): First string
        s2 (str): Second string
    
    Returns:
        int: Edit distance (number of operations needed to transform s1 to s2)
    """

def ratio(s1: str, s2: str) -> float:
    """
    Calculate string similarity ratio.
    
    Args:
        s1 (str): First string
        s2 (str): Second string
    
    Returns:
        float: Similarity ratio between 0.0 and 1.0
    """

def hamming(s1: str, s2: str) -> int:
    """
    Compute Hamming distance between strings of equal length.
    
    Args:
        s1 (str): First string
        s2 (str): Second string (must be same length as s1)
    
    Returns:
        int: Hamming distance
    
    Raises:
        ValueError: If strings have different lengths
    """

Advanced Similarity Metrics

Specialized string similarity algorithms for different use cases.

def jaro(s1: str, s2: str) -> float:
    """
    Compute Jaro string similarity metric.
    
    Args:
        s1 (str): First string
        s2 (str): Second string
    
    Returns:
        float: Jaro similarity score between 0.0 and 1.0
    """

def jaro_winkler(s1: str, s2: str) -> float:
    """
    Compute Jaro-Winkler string similarity metric.
    
    Args:
        s1 (str): First string
        s2 (str): Second string
    
    Returns:
        float: Jaro-Winkler similarity score between 0.0 and 1.0
    """

String Median and Averaging

Functions for finding representative strings from collections.

def median(strings: list[str]) -> str:
    """
    Find approximate median string from a sequence of strings.
    
    Args:
        strings (list[str]): List of strings to find median for
    
    Returns:
        str: Approximate median string
    """

def median_improve(strings: list[str], median: str) -> str:
    """
    Improve median string computation.
    
    Args:
        strings (list[str]): List of strings
        median (str): Initial median string to improve
    
    Returns:
        str: Improved median string
    """

def quickmedian(strings: list[str]) -> str:
    """
    Fast median string computation.
    
    Args:
        strings (list[str]): List of strings to find median for
    
    Returns:
        str: Quick median string
    """

def setmedian(strings: list[str]) -> str:
    """
    Compute median for set of strings.
    
    Args:
        strings (list[str]): List of strings treated as a set
    
    Returns:
        str: Set median string
    """

Sequence and Set Similarity

Functions for comparing collections of strings.

def seqratio(seq1: list[str], seq2: list[str]) -> float:
    """
    Similarity ratio for string sequences.
    
    Args:
        seq1 (list[str]): First sequence of strings
        seq2 (list[str]): Second sequence of strings
    
    Returns:
        float: Sequence similarity ratio between 0.0 and 1.0
    """

def setratio(set1: list[str], set2: list[str]) -> float:
    """
    Similarity ratio for string sets.
    
    Args:
        set1 (list[str]): First set of strings
        set2 (list[str]): Second set of strings
    
    Returns:
        float: Set similarity ratio between 0.0 and 1.0
    """

Edit Operations

Functions for analyzing and manipulating the edit operations between strings.

def editops(s1: str, s2: str) -> list[tuple[str, int, int]]:
    """
    Return list of edit operations to transform one string to another.
    
    Args:
        s1 (str): Source string
        s2 (str): Target string
    
    Returns:
        list[tuple[str, int, int]]: List of (operation, pos1, pos2) tuples
        where operation is 'delete', 'insert', or 'substitute'
    """

def opcodes(s1: str, s2: str) -> list[tuple[str, int, int, int, int]]:
    """
    Return list of opcodes (similar to difflib.SequenceMatcher.get_opcodes()).
    
    Args:
        s1 (str): Source string
        s2 (str): Target string
    
    Returns:
        list[tuple[str, int, int, int, int]]: List of (tag, i1, i2, j1, j2) tuples
    """

def inverse(ops: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:
    """
    Invert edit operations.
    
    Args:
        ops (list[tuple[str, int, int]]): Edit operations to invert
    
    Returns:
        list[tuple[str, int, int]]: Inverted edit operations
    """

def apply_edit(ops: list[tuple[str, int, int]], s1: str, s2: str) -> str:
    """
    Apply edit operations to transform string.
    
    Args:
        ops (list[tuple[str, int, int]]): Edit operations to apply
        s1 (str): Source string
        s2 (str): Target string
    
    Returns:
        str: Transformed string
    """

def matching_blocks(s1: str, s2: str) -> list[tuple[int, int, int]]:
    """
    Return matching blocks between strings.
    
    Args:
        s1 (str): First string
        s2 (str): Second string
    
    Returns:
        list[tuple[int, int, int]]: List of (pos1, pos2, length) tuples
    """

def subtract_edit(ops1: list[tuple[str, int, int]], ops2: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:
    """
    Subtract edit operations.
    
    Args:
        ops1 (list[tuple[str, int, int]]): First set of edit operations
        ops2 (list[tuple[str, int, int]]): Second set of edit operations
    
    Returns:
        list[tuple[str, int, int]]: Resulting edit operations
    """

Usage Examples

Finding Similar Strings

import Levenshtein

# Find the most similar string from a list
target = "hello"
candidates = ["hallo", "hullo", "help", "world"]

best_match = min(candidates, key=lambda x: Levenshtein.distance(target, x))
print(f"Best match: {best_match}")  # "hallo"

# Get similarity scores
similarities = [(candidate, Levenshtein.ratio(target, candidate)) 
                for candidate in candidates]
similarities.sort(key=lambda x: x[1], reverse=True)
print("Similarity scores:", similarities)

String Transformation Analysis

import Levenshtein

# Analyze how to transform one string to another
source = "kitten"
target = "sitting"

# Get edit operations
ops = Levenshtein.editops(source, target)
print("Edit operations:", ops)

# Get detailed opcodes
codes = Levenshtein.opcodes(source, target)
print("Opcodes:", codes)

# Find matching blocks
blocks = Levenshtein.matching_blocks(source, target)
print("Matching blocks:", blocks)

Working with String Collections

import Levenshtein

# Find median string from a collection
words = ["cat", "bat", "rat", "hat", "mat"]
median_word = Levenshtein.median(words)
print(f"Median word: {median_word}")

# Compare sequences of strings
seq1 = ["hello", "world"]
seq2 = ["hello", "world", "python"]
similarity = Levenshtein.seqratio(seq1, seq2)
print(f"Sequence similarity: {similarity}")

Key Features

  • High Performance: C extension implementation for fast computation
  • String Type Support: Works with both regular strings and Unicode strings
  • Type Consistency: All strings in a single operation must be of the same type
  • Compatibility: Drop-in replacement for legacy python-Levenshtein usage
  • Comprehensive API: Complete set of string similarity and edit distance functions

Limitations

  • Cannot mix string types in function calls (all must be str or all must be bytes)
  • Designed specifically for strings, not arbitrary sequences
  • Requires strings to be of equal length for Hamming distance calculation