or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-jellyfish

Approximate and phonetic matching of strings

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/jellyfish@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-jellyfish@1.2.0

0

# Jellyfish

1

2

A high-performance Python library for approximate and phonetic string matching algorithms. Jellyfish provides fast implementations of various string distance and similarity metrics along with phonetic encoding algorithms, built with Rust for maximum performance while maintaining ease of use through Python interfaces.

3

4

## Package Information

5

6

- **Package Name**: jellyfish

7

- **Package Type**: pypi

8

- **Language**: Python with Rust implementation

9

- **Installation**: `pip install jellyfish`

10

11

## Core Imports

12

13

```python

14

import jellyfish

15

```

16

17

Individual function imports:

18

19

```python

20

from jellyfish import levenshtein_distance, jaro_similarity, soundex, metaphone

21

```

22

23

## Basic Usage

24

25

```python

26

import jellyfish

27

28

# String distance calculations

29

distance = jellyfish.levenshtein_distance('jellyfish', 'smellyfish')

30

print(distance) # 2

31

32

similarity = jellyfish.jaro_similarity('jellyfish', 'smellyfish')

33

print(similarity) # 0.896...

34

35

# Phonetic encoding

36

code = jellyfish.soundex('Jellyfish')

37

print(code) # 'J412'

38

39

metaphone_code = jellyfish.metaphone('Jellyfish')

40

print(metaphone_code) # 'JLFX'

41

```

42

43

## Capabilities

44

45

### String Distance and Similarity Functions

46

47

Distance and similarity metrics for comparing strings, useful for fuzzy matching, data deduplication, and record linkage applications.

48

49

#### Levenshtein Distance

50

51

Calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.

52

53

```python { .api }

54

def levenshtein_distance(s1: str, s2: str) -> int:

55

"""

56

Calculate the Levenshtein distance between two strings.

57

58

Parameters:

59

- s1: First string to compare

60

- s2: Second string to compare

61

62

Returns:

63

int: Number of edits required to transform s1 to s2

64

65

Raises:

66

TypeError: If either argument is not a string

67

"""

68

```

69

70

#### Damerau-Levenshtein Distance

71

72

Calculates distance allowing insertions, deletions, substitutions, and transpositions (swapping of adjacent characters).

73

74

```python { .api }

75

def damerau_levenshtein_distance(s1: str, s2: str) -> int:

76

"""

77

Calculate the Damerau-Levenshtein distance between two strings.

78

79

Parameters:

80

- s1: First string to compare

81

- s2: Second string to compare

82

83

Returns:

84

int: Number of edits (including transpositions) required to transform s1 to s2

85

86

Raises:

87

TypeError: If either argument is not a string

88

"""

89

```

90

91

#### Hamming Distance

92

93

Calculates the number of positions at which corresponding characters are different. Handles strings of different lengths by including the length difference.

94

95

```python { .api }

96

def hamming_distance(s1: str, s2: str) -> int:

97

"""

98

Calculate the Hamming distance between two strings.

99

100

Parameters:

101

- s1: First string to compare

102

- s2: Second string to compare

103

104

Returns:

105

int: Number of differing positions plus length difference

106

107

Raises:

108

TypeError: If either argument is not a string

109

"""

110

```

111

112

#### Jaro Similarity

113

114

Calculates Jaro similarity, which considers character matches and transpositions.

115

116

```python { .api }

117

def jaro_similarity(s1: str, s2: str) -> float:

118

"""

119

Calculate the Jaro similarity between two strings.

120

121

Parameters:

122

- s1: First string to compare

123

- s2: Second string to compare

124

125

Returns:

126

float: Similarity score between 0.0 (no similarity) and 1.0 (identical)

127

128

Raises:

129

TypeError: If either argument is not a string

130

"""

131

```

132

133

#### Jaro-Winkler Similarity

134

135

Enhanced Jaro similarity that gives higher scores to strings with common prefixes, with optional long string tolerance.

136

137

```python { .api }

138

def jaro_winkler_similarity(s1: str, s2: str, long_tolerance: Optional[bool] = None) -> float:

139

"""

140

Calculate the Jaro-Winkler similarity between two strings.

141

142

Parameters:

143

- s1: First string to compare

144

- s2: Second string to compare

145

- long_tolerance: Apply long string tolerance adjustment for extended similarity calculation (None and False behave identically)

146

147

Returns:

148

float: Similarity score between 0.0 (no similarity) and 1.0 (identical)

149

150

Raises:

151

TypeError: If either argument is not a string

152

"""

153

```

154

155

#### Jaccard Similarity

156

157

Calculates Jaccard similarity/index using either word-level or character n-gram comparison.

158

159

```python { .api }

160

def jaccard_similarity(s1: str, s2: str, ngram_size: Optional[int] = None) -> float:

161

"""

162

Calculate the Jaccard similarity between two strings.

163

164

Parameters:

165

- s1: First string to compare

166

- s2: Second string to compare

167

- ngram_size: Size for character n-grams; if None, uses word-level comparison

168

169

Returns:

170

float: Similarity score between 0.0 (no similarity) and 1.0 (identical)

171

172

Raises:

173

TypeError: If either argument is not a string

174

"""

175

```

176

177

#### Match Rating Comparison

178

179

Compares two strings using the Match Rating Approach algorithm, returning a boolean match result or None if comparison cannot be made.

180

181

```python { .api }

182

def match_rating_comparison(s1: str, s2: str) -> Optional[bool]:

183

"""

184

Compare two strings using Match Rating Approach algorithm.

185

186

Parameters:

187

- s1: First string to compare

188

- s2: Second string to compare

189

190

Returns:

191

Optional[bool]: True if strings match, False if they don't, None if length difference >= 3

192

193

Raises:

194

TypeError: If either argument is not a string

195

"""

196

```

197

198

### Phonetic Encoding Functions

199

200

Phonetic encoding algorithms that convert strings to phonetic codes, enabling "sounds-like" matching for names and words.

201

202

#### Soundex

203

204

American Soundex algorithm that encodes strings based on their English pronunciation.

205

206

```python { .api }

207

def soundex(s: str) -> str:

208

"""

209

Calculate the American Soundex code for a string.

210

211

Parameters:

212

- s: String to encode

213

214

Returns:

215

str: 4-character soundex code (letter followed by 3 digits)

216

217

Raises:

218

TypeError: If argument is not a string

219

"""

220

```

221

222

#### Metaphone

223

224

Metaphone phonetic encoding algorithm for English pronunciation matching.

225

226

```python { .api }

227

def metaphone(s: str) -> str:

228

"""

229

Calculate the Metaphone code for a string.

230

231

Parameters:

232

- s: String to encode

233

234

Returns:

235

str: Metaphone phonetic code

236

237

Raises:

238

TypeError: If argument is not a string

239

"""

240

```

241

242

#### NYSIIS

243

244

New York State Identification and Intelligence System phonetic encoding.

245

246

```python { .api }

247

def nysiis(s: str) -> str:

248

"""

249

Calculate the NYSIIS (New York State Identification and Intelligence System) code.

250

251

Parameters:

252

- s: String to encode

253

254

Returns:

255

str: NYSIIS phonetic code

256

257

Raises:

258

TypeError: If argument is not a string

259

"""

260

```

261

262

#### Match Rating Codex

263

264

Match Rating Approach codex encoding for string comparison preparation.

265

266

```python { .api }

267

def match_rating_codex(s: str) -> str:

268

"""

269

Calculate the Match Rating Approach codex for a string.

270

271

Parameters:

272

- s: String to encode (must contain only alphabetic characters)

273

274

Returns:

275

str: Match Rating codex (up to 6 characters)

276

277

Raises:

278

TypeError: If argument is not a string

279

ValueError: If string contains non-alphabetic characters

280

"""

281

```

282

283

## Types

284

285

```python { .api }

286

from typing import Optional

287

288

# All functions accept str arguments and have specific return types as documented above

289

# No custom classes or complex types are exposed in the public API

290

```