Python extension for computing string edit distances and similarities.
npx @tessl/cli install tessl/pypi-levenshtein@0.27.0A high-performance Python C extension for computing various string edit distances and similarities. The library provides fast computation of Levenshtein (edit) distance, Hamming distance, Jaro and Jaro-Winkler similarities, along with detailed edit operations, string averaging, and sequence similarity analysis.
pip install levenshteinimport LevenshteinCommon usage patterns:
from Levenshtein import distance, ratio, editops, opcodes, medianimport Levenshtein
# Calculate edit distance between strings
dist = Levenshtein.distance("kitten", "sitting")
print(f"Edit distance: {dist}") # Edit distance: 3
# Calculate similarity ratio (0.0 to 1.0)
similarity = Levenshtein.ratio("kitten", "sitting")
print(f"Similarity: {similarity:.2f}") # Similarity: 0.62
# Get edit operations to transform one string to another
ops = Levenshtein.editops("kitten", "sitting")
print(ops) # [('replace', 0, 0), ('replace', 4, 4), ('insert', 6, 6)]
# Find approximate median of multiple strings
strings = ["Levenshtein", "Levenhstein", "Levenshtien", "Levenstein"]
med = Levenshtein.median(strings)
print(f"Median: {med}") # Median: LevenshteinThe Levenshtein library is built on the rapidfuzz library for core distance algorithms, providing:
Core functions for computing various string distance metrics and similarity scores, including Levenshtein distance, normalized similarity ratios, Hamming distance, and Jaro/Jaro-Winkler similarities.
def distance(s1, s2, *, weights=(1, 1, 1), processor=None, score_cutoff=None, score_hint=None):
"""Calculate Levenshtein distance with custom operation weights."""
def ratio(s1, s2, *, processor=None, score_cutoff=None):
"""Calculate normalized indel similarity ratio [0, 1]."""
def hamming(s1, s2, *, pad=True, processor=None, score_cutoff=None):
"""Calculate Hamming distance (substitutions only)."""
def jaro(s1, s2, *, processor=None, score_cutoff=None):
"""Calculate Jaro similarity."""
def jaro_winkler(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None):
"""Calculate Jaro-Winkler similarity with prefix weighting."""String Distance and Similarity
Functions for analyzing and manipulating edit operation sequences that transform one string into another, including conversion between different operation formats and applying transformations.
def editops(*args):
"""Find sequence of edit operations (triples) transforming one string to another."""
def opcodes(*args):
"""Find sequence of edit operations (5-tuples) like SequenceMatcher."""
def matching_blocks(edit_operations, source_string, destination_string):
"""Find identical blocks in two strings from edit operations."""
def apply_edit(edit_operations, source_string, destination_string):
"""Apply sequence of edit operations to transform a string."""Functions for computing approximate median strings, improving strings toward a target set, and calculating sequence and set similarity ratios for multiple strings.
def median(strings, weights=None):
"""Find approximate median string from a list of strings."""
def quickmedian(strings, weights=None):
"""Fast approximate median string calculation."""
def median_improve(string, strings, weights=None):
"""Improve a string towards median of given strings."""
def seqratio(strings1, strings2):
"""Calculate similarity ratio between two string sequences."""
def setratio(strings1, strings2):
"""Calculate similarity ratio between two string sets."""# Type aliases for function parameters
Sequence = Union[str, bytes, List[Any]]
Processor = Callable[[Sequence], Sequence]
EditOperation = Tuple[str, int, int] # (operation, source_pos, dest_pos)
Opcode = Tuple[str, int, int, int, int] # (operation, start1, end1, start2, end2)
MatchingBlock = Tuple[int, int, int] # (source_pos, dest_pos, length)