or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

edit-distance.mdfactory-utilities.mdindex.mdngram-shingle.mdphonetic-record-linkage.mdsequence-based.md

edit-distance.mddocs/

0

# Edit Distance Algorithms

1

2

Classic edit distance algorithms that measure the minimum number of character operations (insertions, deletions, substitutions, and sometimes transpositions) needed to transform one string into another. These algorithms are fundamental to many text processing and comparison tasks.

3

4

## Capabilities

5

6

### Levenshtein Distance

7

8

The classic Levenshtein distance algorithm computes the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another. This is a metric distance that satisfies the triangle inequality.

9

10

```python { .api }

11

class Levenshtein(MetricStringDistance):

12

def distance(self, s0: str, s1: str) -> float:

13

"""

14

Calculate Levenshtein distance between two strings.

15

16

Args:

17

s0: First string

18

s1: Second string

19

20

Returns:

21

float: Number of edit operations (minimum 0, no maximum limit)

22

23

Raises:

24

TypeError: If either string is None

25

"""

26

```

27

28

**Usage Example:**

29

30

```python

31

from similarity.levenshtein import Levenshtein

32

33

lev = Levenshtein()

34

distance = lev.distance('kitten', 'sitting') # Returns: 3.0

35

distance = lev.distance('hello', 'hello') # Returns: 0.0

36

```

37

38

### Normalized Levenshtein

39

40

A normalized version of Levenshtein distance that returns values in the range [0.0, 1.0] by dividing the edit distance by the length of the longest string. Also provides similarity as 1 - normalized distance.

41

42

```python { .api }

43

class NormalizedLevenshtein(NormalizedStringDistance, NormalizedStringSimilarity):

44

def __init__(self): ...

45

46

def distance(self, s0: str, s1: str) -> float:

47

"""

48

Calculate normalized Levenshtein distance (0.0-1.0).

49

50

Args:

51

s0: First string

52

s1: Second string

53

54

Returns:

55

float: Normalized distance where 0.0 = identical, 1.0 = completely different

56

57

Raises:

58

TypeError: If either string is None

59

"""

60

61

def similarity(self, s0: str, s1: str) -> float:

62

"""

63

Calculate normalized Levenshtein similarity (0.0-1.0).

64

65

Args:

66

s0: First string

67

s1: Second string

68

69

Returns:

70

float: Normalized similarity where 1.0 = identical, 0.0 = completely different

71

72

Raises:

73

TypeError: If either string is None

74

"""

75

```

76

77

**Usage Example:**

78

79

```python

80

from similarity.normalized_levenshtein import NormalizedLevenshtein

81

82

norm_lev = NormalizedLevenshtein()

83

distance = norm_lev.distance('kitten', 'sitting') # Returns: ~0.428

84

similarity = norm_lev.similarity('kitten', 'sitting') # Returns: ~0.571

85

```

86

87

### Weighted Levenshtein

88

89

An implementation of Levenshtein distance that allows custom weights for different character operations. Commonly used for OCR applications where certain character substitutions are more likely, or keyboard typing auto-correction where adjacent keys have lower substitution costs.

90

91

```python { .api }

92

class WeightedLevenshtein(StringDistance):

93

def __init__(self, character_substitution: CharacterSubstitutionInterface,

94

character_ins_del: CharacterInsDelInterface = None):

95

"""

96

Initialize with custom cost interfaces.

97

98

Args:

99

character_substitution: Interface defining substitution costs

100

character_ins_del: Optional interface for insertion/deletion costs (default: 1.0)

101

"""

102

103

def distance(self, s0: str, s1: str) -> float:

104

"""

105

Calculate weighted Levenshtein distance with custom operation costs.

106

107

Args:

108

s0: First string

109

s1: Second string

110

111

Returns:

112

float: Weighted edit distance

113

114

Raises:

115

TypeError: If either string is None

116

"""

117

118

class CharacterSubstitutionInterface:

119

def cost(self, c0: str, c1: str) -> float:

120

"""

121

Return the cost of substituting character c0 with c1.

122

123

Args:

124

c0: Original character

125

c1: Replacement character

126

127

Returns:

128

float: Cost of substitution (typically 0.0-1.0 range)

129

"""

130

131

class CharacterInsDelInterface:

132

def deletion_cost(self, c: str) -> float:

133

"""

134

Return the cost of deleting character c.

135

136

Args:

137

c: Character to delete

138

139

Returns:

140

float: Cost of deletion

141

"""

142

143

def insertion_cost(self, c: str) -> float:

144

"""

145

Return the cost of inserting character c.

146

147

Args:

148

c: Character to insert

149

150

Returns:

151

float: Cost of insertion

152

"""

153

```

154

155

**Usage Example:**

156

157

```python

158

from similarity.weighted_levenshtein import WeightedLevenshtein, CharacterSubstitutionInterface

159

160

class OCRSubstitution(CharacterSubstitutionInterface):

161

def cost(self, c0, c1):

162

# Lower cost for visually similar characters

163

if (c0, c1) in [('o', '0'), ('l', '1'), ('S', '5')]:

164

return 0.3

165

return 1.0

166

167

weighted_lev = WeightedLevenshtein(OCRSubstitution())

168

distance = weighted_lev.distance('S0me1ext', 'Some1ext') # Lower cost due to custom weights

169

```

170

171

### Optimal String Alignment

172

173

A variant of Damerau-Levenshtein that computes edit distance with the restriction that no substring can be edited more than once. Supports insertions, deletions, substitutions, and adjacent character transpositions.

174

175

```python { .api }

176

class OptimalStringAlignment(StringDistance):

177

def distance(self, s0: str, s1: str) -> float:

178

"""

179

Calculate Optimal String Alignment distance.

180

181

Args:

182

s0: First string

183

s1: Second string

184

185

Returns:

186

float: OSA distance (minimum 0, no maximum limit)

187

188

Raises:

189

TypeError: If either string is None

190

"""

191

```

192

193

**Usage Example:**

194

195

```python

196

from similarity.optimal_string_alignment import OptimalStringAlignment

197

198

osa = OptimalStringAlignment()

199

distance = osa.distance('CA', 'ABC') # Returns: 3.0

200

distance = osa.distance('hello', 'ehllo') # Returns: 1.0 (transposition)

201

```