Python compatibility wrapper for computing string edit distances and similarities using fast Levenshtein algorithms.
npx @tessl/cli install tessl/pypi-python-levenshtein@0.27.00
# python-Levenshtein
1
2
A Python compatibility wrapper providing fast computation of string edit distances and similarities through the Levenshtein C extension library. This package enables legacy code to continue working with the old `python-Levenshtein` package name while accessing the modern `Levenshtein` package functionality.
3
4
## Package Information
5
6
- **Package Name**: python-Levenshtein
7
- **Language**: Python
8
- **Installation**: `pip install python-Levenshtein`
9
- **Type**: Compatibility wrapper package
10
11
## Core Imports
12
13
```python
14
import Levenshtein
15
```
16
17
Import specific functions:
18
19
```python
20
from Levenshtein import distance, ratio, editops
21
```
22
23
## Basic Usage
24
25
```python
26
import Levenshtein
27
28
# Calculate edit distance between strings
29
distance = Levenshtein.distance("hello", "world")
30
print(distance) # 4
31
32
# Calculate similarity ratio (0.0 to 1.0)
33
similarity = Levenshtein.ratio("hello", "help")
34
print(similarity) # 0.75
35
36
# Get edit operations needed to transform one string to another
37
ops = Levenshtein.editops("kitten", "sitting")
38
print(ops) # [('substitute', 0, 0), ('substitute', 4, 4), ('insert', 6, 6)]
39
```
40
41
## Architecture
42
43
The `python-Levenshtein` package serves as a compatibility wrapper that maintains backward compatibility for legacy code while providing access to modern string similarity functionality:
44
45
- **Wrapper Design**: This package contains no implementation code itself - it simply declares a dependency on `Levenshtein==0.27.1`
46
- **Compatibility Layer**: Enables legacy code using the old `python-Levenshtein` package name to continue working unchanged
47
- **Underlying Implementation**: All functionality is provided by the high-performance C extension in the `Levenshtein` package
48
- **Import Pattern**: Functions are accessed through `import Levenshtein` regardless of whether installed via `python-Levenshtein` or `Levenshtein`
49
50
This design ensures seamless migration paths while leveraging the optimized C implementation for maximum performance in string similarity computations.
51
52
## Capabilities
53
54
### Core Distance Functions
55
56
Basic string distance calculations using various algorithms.
57
58
```python { .api }
59
def distance(s1: str, s2: str) -> int:
60
"""
61
Compute the Levenshtein edit distance between two strings.
62
63
Args:
64
s1 (str): First string
65
s2 (str): Second string
66
67
Returns:
68
int: Edit distance (number of operations needed to transform s1 to s2)
69
"""
70
71
def ratio(s1: str, s2: str) -> float:
72
"""
73
Calculate string similarity ratio.
74
75
Args:
76
s1 (str): First string
77
s2 (str): Second string
78
79
Returns:
80
float: Similarity ratio between 0.0 and 1.0
81
"""
82
83
def hamming(s1: str, s2: str) -> int:
84
"""
85
Compute Hamming distance between strings of equal length.
86
87
Args:
88
s1 (str): First string
89
s2 (str): Second string (must be same length as s1)
90
91
Returns:
92
int: Hamming distance
93
94
Raises:
95
ValueError: If strings have different lengths
96
"""
97
```
98
99
### Advanced Similarity Metrics
100
101
Specialized string similarity algorithms for different use cases.
102
103
```python { .api }
104
def jaro(s1: str, s2: str) -> float:
105
"""
106
Compute Jaro string similarity metric.
107
108
Args:
109
s1 (str): First string
110
s2 (str): Second string
111
112
Returns:
113
float: Jaro similarity score between 0.0 and 1.0
114
"""
115
116
def jaro_winkler(s1: str, s2: str) -> float:
117
"""
118
Compute Jaro-Winkler string similarity metric.
119
120
Args:
121
s1 (str): First string
122
s2 (str): Second string
123
124
Returns:
125
float: Jaro-Winkler similarity score between 0.0 and 1.0
126
"""
127
```
128
129
### String Median and Averaging
130
131
Functions for finding representative strings from collections.
132
133
```python { .api }
134
def median(strings: list[str]) -> str:
135
"""
136
Find approximate median string from a sequence of strings.
137
138
Args:
139
strings (list[str]): List of strings to find median for
140
141
Returns:
142
str: Approximate median string
143
"""
144
145
def median_improve(strings: list[str], median: str) -> str:
146
"""
147
Improve median string computation.
148
149
Args:
150
strings (list[str]): List of strings
151
median (str): Initial median string to improve
152
153
Returns:
154
str: Improved median string
155
"""
156
157
def quickmedian(strings: list[str]) -> str:
158
"""
159
Fast median string computation.
160
161
Args:
162
strings (list[str]): List of strings to find median for
163
164
Returns:
165
str: Quick median string
166
"""
167
168
def setmedian(strings: list[str]) -> str:
169
"""
170
Compute median for set of strings.
171
172
Args:
173
strings (list[str]): List of strings treated as a set
174
175
Returns:
176
str: Set median string
177
"""
178
```
179
180
### Sequence and Set Similarity
181
182
Functions for comparing collections of strings.
183
184
```python { .api }
185
def seqratio(seq1: list[str], seq2: list[str]) -> float:
186
"""
187
Similarity ratio for string sequences.
188
189
Args:
190
seq1 (list[str]): First sequence of strings
191
seq2 (list[str]): Second sequence of strings
192
193
Returns:
194
float: Sequence similarity ratio between 0.0 and 1.0
195
"""
196
197
def setratio(set1: list[str], set2: list[str]) -> float:
198
"""
199
Similarity ratio for string sets.
200
201
Args:
202
set1 (list[str]): First set of strings
203
set2 (list[str]): Second set of strings
204
205
Returns:
206
float: Set similarity ratio between 0.0 and 1.0
207
"""
208
```
209
210
### Edit Operations
211
212
Functions for analyzing and manipulating the edit operations between strings.
213
214
```python { .api }
215
def editops(s1: str, s2: str) -> list[tuple[str, int, int]]:
216
"""
217
Return list of edit operations to transform one string to another.
218
219
Args:
220
s1 (str): Source string
221
s2 (str): Target string
222
223
Returns:
224
list[tuple[str, int, int]]: List of (operation, pos1, pos2) tuples
225
where operation is 'delete', 'insert', or 'substitute'
226
"""
227
228
def opcodes(s1: str, s2: str) -> list[tuple[str, int, int, int, int]]:
229
"""
230
Return list of opcodes (similar to difflib.SequenceMatcher.get_opcodes()).
231
232
Args:
233
s1 (str): Source string
234
s2 (str): Target string
235
236
Returns:
237
list[tuple[str, int, int, int, int]]: List of (tag, i1, i2, j1, j2) tuples
238
"""
239
240
def inverse(ops: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:
241
"""
242
Invert edit operations.
243
244
Args:
245
ops (list[tuple[str, int, int]]): Edit operations to invert
246
247
Returns:
248
list[tuple[str, int, int]]: Inverted edit operations
249
"""
250
251
def apply_edit(ops: list[tuple[str, int, int]], s1: str, s2: str) -> str:
252
"""
253
Apply edit operations to transform string.
254
255
Args:
256
ops (list[tuple[str, int, int]]): Edit operations to apply
257
s1 (str): Source string
258
s2 (str): Target string
259
260
Returns:
261
str: Transformed string
262
"""
263
264
def matching_blocks(s1: str, s2: str) -> list[tuple[int, int, int]]:
265
"""
266
Return matching blocks between strings.
267
268
Args:
269
s1 (str): First string
270
s2 (str): Second string
271
272
Returns:
273
list[tuple[int, int, int]]: List of (pos1, pos2, length) tuples
274
"""
275
276
def subtract_edit(ops1: list[tuple[str, int, int]], ops2: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:
277
"""
278
Subtract edit operations.
279
280
Args:
281
ops1 (list[tuple[str, int, int]]): First set of edit operations
282
ops2 (list[tuple[str, int, int]]): Second set of edit operations
283
284
Returns:
285
list[tuple[str, int, int]]: Resulting edit operations
286
"""
287
```
288
289
## Usage Examples
290
291
### Finding Similar Strings
292
293
```python
294
import Levenshtein
295
296
# Find the most similar string from a list
297
target = "hello"
298
candidates = ["hallo", "hullo", "help", "world"]
299
300
best_match = min(candidates, key=lambda x: Levenshtein.distance(target, x))
301
print(f"Best match: {best_match}") # "hallo"
302
303
# Get similarity scores
304
similarities = [(candidate, Levenshtein.ratio(target, candidate))
305
for candidate in candidates]
306
similarities.sort(key=lambda x: x[1], reverse=True)
307
print("Similarity scores:", similarities)
308
```
309
310
### String Transformation Analysis
311
312
```python
313
import Levenshtein
314
315
# Analyze how to transform one string to another
316
source = "kitten"
317
target = "sitting"
318
319
# Get edit operations
320
ops = Levenshtein.editops(source, target)
321
print("Edit operations:", ops)
322
323
# Get detailed opcodes
324
codes = Levenshtein.opcodes(source, target)
325
print("Opcodes:", codes)
326
327
# Find matching blocks
328
blocks = Levenshtein.matching_blocks(source, target)
329
print("Matching blocks:", blocks)
330
```
331
332
### Working with String Collections
333
334
```python
335
import Levenshtein
336
337
# Find median string from a collection
338
words = ["cat", "bat", "rat", "hat", "mat"]
339
median_word = Levenshtein.median(words)
340
print(f"Median word: {median_word}")
341
342
# Compare sequences of strings
343
seq1 = ["hello", "world"]
344
seq2 = ["hello", "world", "python"]
345
similarity = Levenshtein.seqratio(seq1, seq2)
346
print(f"Sequence similarity: {similarity}")
347
```
348
349
## Key Features
350
351
- **High Performance**: C extension implementation for fast computation
352
- **String Type Support**: Works with both regular strings and Unicode strings
353
- **Type Consistency**: All strings in a single operation must be of the same type
354
- **Compatibility**: Drop-in replacement for legacy python-Levenshtein usage
355
- **Comprehensive API**: Complete set of string similarity and edit distance functions
356
357
## Limitations
358
359
- Cannot mix string types in function calls (all must be str or all must be bytes)
360
- Designed specifically for strings, not arbitrary sequences
361
- Requires strings to be of equal length for Hamming distance calculation