or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

edit-operations.mdindex.mdstring-averaging.mdstring-distance.md
tile.json

tessl/pypi-levenshtein

Python extension for computing string edit distances and similarities.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/levenshtein@0.27.x

To install, run

npx @tessl/cli install tessl/pypi-levenshtein@0.27.0

index.mddocs/

Levenshtein

A high-performance Python C extension for computing various string edit distances and similarities. The library provides fast computation of Levenshtein (edit) distance, Hamming distance, Jaro and Jaro-Winkler similarities, along with detailed edit operations, string averaging, and sequence similarity analysis.

Package Information

  • Package Name: Levenshtein
  • Language: Python
  • Installation: pip install levenshtein
  • Requirements: Python 3.9 or later

Core Imports

import Levenshtein

Common usage patterns:

from Levenshtein import distance, ratio, editops, opcodes, median

Basic Usage

import Levenshtein

# Calculate edit distance between strings
dist = Levenshtein.distance("kitten", "sitting")
print(f"Edit distance: {dist}")  # Edit distance: 3

# Calculate similarity ratio (0.0 to 1.0)
similarity = Levenshtein.ratio("kitten", "sitting") 
print(f"Similarity: {similarity:.2f}")  # Similarity: 0.62

# Get edit operations to transform one string to another
ops = Levenshtein.editops("kitten", "sitting")
print(ops)  # [('replace', 0, 0), ('replace', 4, 4), ('insert', 6, 6)]

# Find approximate median of multiple strings
strings = ["Levenshtein", "Levenhstein", "Levenshtien", "Levenstein"] 
med = Levenshtein.median(strings)
print(f"Median: {med}")  # Median: Levenshtein

Architecture

The Levenshtein library is built on the rapidfuzz library for core distance algorithms, providing:

  • High Performance: C extension implementation for fast computation
  • Multiple Metrics: Support for various string distance and similarity measures
  • Edit Analysis: Detailed edit operation sequences and transformations
  • String Averaging: Median string calculation and string improvement algorithms
  • Compatibility: SequenceMatcher-like interface for drop-in replacement scenarios

Capabilities

String Distance and Similarity

Core functions for computing various string distance metrics and similarity scores, including Levenshtein distance, normalized similarity ratios, Hamming distance, and Jaro/Jaro-Winkler similarities.

def distance(s1, s2, *, weights=(1, 1, 1), processor=None, score_cutoff=None, score_hint=None):
    """Calculate Levenshtein distance with custom operation weights."""
    
def ratio(s1, s2, *, processor=None, score_cutoff=None):
    """Calculate normalized indel similarity ratio [0, 1]."""
    
def hamming(s1, s2, *, pad=True, processor=None, score_cutoff=None):
    """Calculate Hamming distance (substitutions only)."""
    
def jaro(s1, s2, *, processor=None, score_cutoff=None):
    """Calculate Jaro similarity."""
    
def jaro_winkler(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None):
    """Calculate Jaro-Winkler similarity with prefix weighting."""

String Distance and Similarity

Edit Operations

Functions for analyzing and manipulating edit operation sequences that transform one string into another, including conversion between different operation formats and applying transformations.

def editops(*args):
    """Find sequence of edit operations (triples) transforming one string to another."""
    
def opcodes(*args):
    """Find sequence of edit operations (5-tuples) like SequenceMatcher."""
    
def matching_blocks(edit_operations, source_string, destination_string):
    """Find identical blocks in two strings from edit operations."""
    
def apply_edit(edit_operations, source_string, destination_string):
    """Apply sequence of edit operations to transform a string."""

Edit Operations

String Averaging and Median

Functions for computing approximate median strings, improving strings toward a target set, and calculating sequence and set similarity ratios for multiple strings.

def median(strings, weights=None):
    """Find approximate median string from a list of strings."""
    
def quickmedian(strings, weights=None):
    """Fast approximate median string calculation."""
    
def median_improve(string, strings, weights=None):
    """Improve a string towards median of given strings."""
    
def seqratio(strings1, strings2):
    """Calculate similarity ratio between two string sequences."""
    
def setratio(strings1, strings2):
    """Calculate similarity ratio between two string sets."""

String Averaging and Median

Types

# Type aliases for function parameters
Sequence = Union[str, bytes, List[Any]]
Processor = Callable[[Sequence], Sequence]
EditOperation = Tuple[str, int, int]  # (operation, source_pos, dest_pos)
Opcode = Tuple[str, int, int, int, int]  # (operation, start1, end1, start2, end2)
MatchingBlock = Tuple[int, int, int]  # (source_pos, dest_pos, length)