Python extension for computing string edit distances and similarities.
npx @tessl/cli install tessl/pypi-levenshtein@0.27.00
# Levenshtein
1
2
A high-performance Python C extension for computing various string edit distances and similarities. The library provides fast computation of Levenshtein (edit) distance, Hamming distance, Jaro and Jaro-Winkler similarities, along with detailed edit operations, string averaging, and sequence similarity analysis.
3
4
## Package Information
5
6
- **Package Name**: Levenshtein
7
- **Language**: Python
8
- **Installation**: `pip install levenshtein`
9
- **Requirements**: Python 3.9 or later
10
11
## Core Imports
12
13
```python
14
import Levenshtein
15
```
16
17
Common usage patterns:
18
19
```python
20
from Levenshtein import distance, ratio, editops, opcodes, median
21
```
22
23
## Basic Usage
24
25
```python
26
import Levenshtein
27
28
# Calculate edit distance between strings
29
dist = Levenshtein.distance("kitten", "sitting")
30
print(f"Edit distance: {dist}") # Edit distance: 3
31
32
# Calculate similarity ratio (0.0 to 1.0)
33
similarity = Levenshtein.ratio("kitten", "sitting")
34
print(f"Similarity: {similarity:.2f}") # Similarity: 0.62
35
36
# Get edit operations to transform one string to another
37
ops = Levenshtein.editops("kitten", "sitting")
38
print(ops) # [('replace', 0, 0), ('replace', 4, 4), ('insert', 6, 6)]
39
40
# Find approximate median of multiple strings
41
strings = ["Levenshtein", "Levenhstein", "Levenshtien", "Levenstein"]
42
med = Levenshtein.median(strings)
43
print(f"Median: {med}") # Median: Levenshtein
44
```
45
46
## Architecture
47
48
The Levenshtein library is built on the rapidfuzz library for core distance algorithms, providing:
49
50
- **High Performance**: C extension implementation for fast computation
51
- **Multiple Metrics**: Support for various string distance and similarity measures
52
- **Edit Analysis**: Detailed edit operation sequences and transformations
53
- **String Averaging**: Median string calculation and string improvement algorithms
54
- **Compatibility**: SequenceMatcher-like interface for drop-in replacement scenarios
55
56
## Capabilities
57
58
### String Distance and Similarity
59
60
Core functions for computing various string distance metrics and similarity scores, including Levenshtein distance, normalized similarity ratios, Hamming distance, and Jaro/Jaro-Winkler similarities.
61
62
```python { .api }
63
def distance(s1, s2, *, weights=(1, 1, 1), processor=None, score_cutoff=None, score_hint=None):
64
"""Calculate Levenshtein distance with custom operation weights."""
65
66
def ratio(s1, s2, *, processor=None, score_cutoff=None):
67
"""Calculate normalized indel similarity ratio [0, 1]."""
68
69
def hamming(s1, s2, *, pad=True, processor=None, score_cutoff=None):
70
"""Calculate Hamming distance (substitutions only)."""
71
72
def jaro(s1, s2, *, processor=None, score_cutoff=None):
73
"""Calculate Jaro similarity."""
74
75
def jaro_winkler(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None):
76
"""Calculate Jaro-Winkler similarity with prefix weighting."""
77
```
78
79
[String Distance and Similarity](./string-distance.md)
80
81
### Edit Operations
82
83
Functions for analyzing and manipulating edit operation sequences that transform one string into another, including conversion between different operation formats and applying transformations.
84
85
```python { .api }
86
def editops(*args):
87
"""Find sequence of edit operations (triples) transforming one string to another."""
88
89
def opcodes(*args):
90
"""Find sequence of edit operations (5-tuples) like SequenceMatcher."""
91
92
def matching_blocks(edit_operations, source_string, destination_string):
93
"""Find identical blocks in two strings from edit operations."""
94
95
def apply_edit(edit_operations, source_string, destination_string):
96
"""Apply sequence of edit operations to transform a string."""
97
```
98
99
[Edit Operations](./edit-operations.md)
100
101
### String Averaging and Median
102
103
Functions for computing approximate median strings, improving strings toward a target set, and calculating sequence and set similarity ratios for multiple strings.
104
105
```python { .api }
106
def median(strings, weights=None):
107
"""Find approximate median string from a list of strings."""
108
109
def quickmedian(strings, weights=None):
110
"""Fast approximate median string calculation."""
111
112
def median_improve(string, strings, weights=None):
113
"""Improve a string towards median of given strings."""
114
115
def seqratio(strings1, strings2):
116
"""Calculate similarity ratio between two string sequences."""
117
118
def setratio(strings1, strings2):
119
"""Calculate similarity ratio between two string sets."""
120
```
121
122
[String Averaging and Median](./string-averaging.md)
123
124
## Types
125
126
```python { .api }
127
# Type aliases for function parameters
128
Sequence = Union[str, bytes, List[Any]]
129
Processor = Callable[[Sequence], Sequence]
130
EditOperation = Tuple[str, int, int] # (operation, source_pos, dest_pos)
131
Opcode = Tuple[str, int, int, int, int] # (operation, start1, end1, start2, end2)
132
MatchingBlock = Tuple[int, int, int] # (source_pos, dest_pos, length)
133
```