or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-python-levenshtein

Python compatibility wrapper for computing string edit distances and similarities using fast Levenshtein algorithms.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/python-levenshtein@0.27.x

To install, run

npx @tessl/cli install tessl/pypi-python-levenshtein@0.27.0

0

# python-Levenshtein

1

2

A Python compatibility wrapper providing fast computation of string edit distances and similarities through the Levenshtein C extension library. This package enables legacy code to continue working with the old `python-Levenshtein` package name while accessing the modern `Levenshtein` package functionality.

3

4

## Package Information

5

6

- **Package Name**: python-Levenshtein

7

- **Language**: Python

8

- **Installation**: `pip install python-Levenshtein`

9

- **Type**: Compatibility wrapper package

10

11

## Core Imports

12

13

```python

14

import Levenshtein

15

```

16

17

Import specific functions:

18

19

```python

20

from Levenshtein import distance, ratio, editops

21

```

22

23

## Basic Usage

24

25

```python

26

import Levenshtein

27

28

# Calculate edit distance between strings

29

distance = Levenshtein.distance("hello", "world")

30

print(distance) # 4

31

32

# Calculate similarity ratio (0.0 to 1.0)

33

similarity = Levenshtein.ratio("hello", "help")

34

print(similarity) # 0.75

35

36

# Get edit operations needed to transform one string to another

37

ops = Levenshtein.editops("kitten", "sitting")

38

print(ops) # [('substitute', 0, 0), ('substitute', 4, 4), ('insert', 6, 6)]

39

```

40

41

## Architecture

42

43

The `python-Levenshtein` package serves as a compatibility wrapper that maintains backward compatibility for legacy code while providing access to modern string similarity functionality:

44

45

- **Wrapper Design**: This package contains no implementation code itself - it simply declares a dependency on `Levenshtein==0.27.1`

46

- **Compatibility Layer**: Enables legacy code using the old `python-Levenshtein` package name to continue working unchanged

47

- **Underlying Implementation**: All functionality is provided by the high-performance C extension in the `Levenshtein` package

48

- **Import Pattern**: Functions are accessed through `import Levenshtein` regardless of whether installed via `python-Levenshtein` or `Levenshtein`

49

50

This design ensures seamless migration paths while leveraging the optimized C implementation for maximum performance in string similarity computations.

51

52

## Capabilities

53

54

### Core Distance Functions

55

56

Basic string distance calculations using various algorithms.

57

58

```python { .api }

59

def distance(s1: str, s2: str) -> int:

60

"""

61

Compute the Levenshtein edit distance between two strings.

62

63

Args:

64

s1 (str): First string

65

s2 (str): Second string

66

67

Returns:

68

int: Edit distance (number of operations needed to transform s1 to s2)

69

"""

70

71

def ratio(s1: str, s2: str) -> float:

72

"""

73

Calculate string similarity ratio.

74

75

Args:

76

s1 (str): First string

77

s2 (str): Second string

78

79

Returns:

80

float: Similarity ratio between 0.0 and 1.0

81

"""

82

83

def hamming(s1: str, s2: str) -> int:

84

"""

85

Compute Hamming distance between strings of equal length.

86

87

Args:

88

s1 (str): First string

89

s2 (str): Second string (must be same length as s1)

90

91

Returns:

92

int: Hamming distance

93

94

Raises:

95

ValueError: If strings have different lengths

96

"""

97

```

98

99

### Advanced Similarity Metrics

100

101

Specialized string similarity algorithms for different use cases.

102

103

```python { .api }

104

def jaro(s1: str, s2: str) -> float:

105

"""

106

Compute Jaro string similarity metric.

107

108

Args:

109

s1 (str): First string

110

s2 (str): Second string

111

112

Returns:

113

float: Jaro similarity score between 0.0 and 1.0

114

"""

115

116

def jaro_winkler(s1: str, s2: str) -> float:

117

"""

118

Compute Jaro-Winkler string similarity metric.

119

120

Args:

121

s1 (str): First string

122

s2 (str): Second string

123

124

Returns:

125

float: Jaro-Winkler similarity score between 0.0 and 1.0

126

"""

127

```

128

129

### String Median and Averaging

130

131

Functions for finding representative strings from collections.

132

133

```python { .api }

134

def median(strings: list[str]) -> str:

135

"""

136

Find approximate median string from a sequence of strings.

137

138

Args:

139

strings (list[str]): List of strings to find median for

140

141

Returns:

142

str: Approximate median string

143

"""

144

145

def median_improve(strings: list[str], median: str) -> str:

146

"""

147

Improve median string computation.

148

149

Args:

150

strings (list[str]): List of strings

151

median (str): Initial median string to improve

152

153

Returns:

154

str: Improved median string

155

"""

156

157

def quickmedian(strings: list[str]) -> str:

158

"""

159

Fast median string computation.

160

161

Args:

162

strings (list[str]): List of strings to find median for

163

164

Returns:

165

str: Quick median string

166

"""

167

168

def setmedian(strings: list[str]) -> str:

169

"""

170

Compute median for set of strings.

171

172

Args:

173

strings (list[str]): List of strings treated as a set

174

175

Returns:

176

str: Set median string

177

"""

178

```

179

180

### Sequence and Set Similarity

181

182

Functions for comparing collections of strings.

183

184

```python { .api }

185

def seqratio(seq1: list[str], seq2: list[str]) -> float:

186

"""

187

Similarity ratio for string sequences.

188

189

Args:

190

seq1 (list[str]): First sequence of strings

191

seq2 (list[str]): Second sequence of strings

192

193

Returns:

194

float: Sequence similarity ratio between 0.0 and 1.0

195

"""

196

197

def setratio(set1: list[str], set2: list[str]) -> float:

198

"""

199

Similarity ratio for string sets.

200

201

Args:

202

set1 (list[str]): First set of strings

203

set2 (list[str]): Second set of strings

204

205

Returns:

206

float: Set similarity ratio between 0.0 and 1.0

207

"""

208

```

209

210

### Edit Operations

211

212

Functions for analyzing and manipulating the edit operations between strings.

213

214

```python { .api }

215

def editops(s1: str, s2: str) -> list[tuple[str, int, int]]:

216

"""

217

Return list of edit operations to transform one string to another.

218

219

Args:

220

s1 (str): Source string

221

s2 (str): Target string

222

223

Returns:

224

list[tuple[str, int, int]]: List of (operation, pos1, pos2) tuples

225

where operation is 'delete', 'insert', or 'substitute'

226

"""

227

228

def opcodes(s1: str, s2: str) -> list[tuple[str, int, int, int, int]]:

229

"""

230

Return list of opcodes (similar to difflib.SequenceMatcher.get_opcodes()).

231

232

Args:

233

s1 (str): Source string

234

s2 (str): Target string

235

236

Returns:

237

list[tuple[str, int, int, int, int]]: List of (tag, i1, i2, j1, j2) tuples

238

"""

239

240

def inverse(ops: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:

241

"""

242

Invert edit operations.

243

244

Args:

245

ops (list[tuple[str, int, int]]): Edit operations to invert

246

247

Returns:

248

list[tuple[str, int, int]]: Inverted edit operations

249

"""

250

251

def apply_edit(ops: list[tuple[str, int, int]], s1: str, s2: str) -> str:

252

"""

253

Apply edit operations to transform string.

254

255

Args:

256

ops (list[tuple[str, int, int]]): Edit operations to apply

257

s1 (str): Source string

258

s2 (str): Target string

259

260

Returns:

261

str: Transformed string

262

"""

263

264

def matching_blocks(s1: str, s2: str) -> list[tuple[int, int, int]]:

265

"""

266

Return matching blocks between strings.

267

268

Args:

269

s1 (str): First string

270

s2 (str): Second string

271

272

Returns:

273

list[tuple[int, int, int]]: List of (pos1, pos2, length) tuples

274

"""

275

276

def subtract_edit(ops1: list[tuple[str, int, int]], ops2: list[tuple[str, int, int]]) -> list[tuple[str, int, int]]:

277

"""

278

Subtract edit operations.

279

280

Args:

281

ops1 (list[tuple[str, int, int]]): First set of edit operations

282

ops2 (list[tuple[str, int, int]]): Second set of edit operations

283

284

Returns:

285

list[tuple[str, int, int]]: Resulting edit operations

286

"""

287

```

288

289

## Usage Examples

290

291

### Finding Similar Strings

292

293

```python

294

import Levenshtein

295

296

# Find the most similar string from a list

297

target = "hello"

298

candidates = ["hallo", "hullo", "help", "world"]

299

300

best_match = min(candidates, key=lambda x: Levenshtein.distance(target, x))

301

print(f"Best match: {best_match}") # "hallo"

302

303

# Get similarity scores

304

similarities = [(candidate, Levenshtein.ratio(target, candidate))

305

for candidate in candidates]

306

similarities.sort(key=lambda x: x[1], reverse=True)

307

print("Similarity scores:", similarities)

308

```

309

310

### String Transformation Analysis

311

312

```python

313

import Levenshtein

314

315

# Analyze how to transform one string to another

316

source = "kitten"

317

target = "sitting"

318

319

# Get edit operations

320

ops = Levenshtein.editops(source, target)

321

print("Edit operations:", ops)

322

323

# Get detailed opcodes

324

codes = Levenshtein.opcodes(source, target)

325

print("Opcodes:", codes)

326

327

# Find matching blocks

328

blocks = Levenshtein.matching_blocks(source, target)

329

print("Matching blocks:", blocks)

330

```

331

332

### Working with String Collections

333

334

```python

335

import Levenshtein

336

337

# Find median string from a collection

338

words = ["cat", "bat", "rat", "hat", "mat"]

339

median_word = Levenshtein.median(words)

340

print(f"Median word: {median_word}")

341

342

# Compare sequences of strings

343

seq1 = ["hello", "world"]

344

seq2 = ["hello", "world", "python"]

345

similarity = Levenshtein.seqratio(seq1, seq2)

346

print(f"Sequence similarity: {similarity}")

347

```

348

349

## Key Features

350

351

- **High Performance**: C extension implementation for fast computation

352

- **String Type Support**: Works with both regular strings and Unicode strings

353

- **Type Consistency**: All strings in a single operation must be of the same type

354

- **Compatibility**: Drop-in replacement for legacy python-Levenshtein usage

355

- **Comprehensive API**: Complete set of string similarity and edit distance functions

356

357

## Limitations

358

359

- Cannot mix string types in function calls (all must be str or all must be bytes)

360

- Designed specifically for strings, not arbitrary sequences

361

- Requires strings to be of equal length for Hamming distance calculation