or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

fuzzy-algorithms.mdindex.mdstring-processing.mdutilities.md

utilities.mddocs/

0

# Utilities and Helpers

1

2

String preprocessing, validation functions, and utility classes for handling edge cases and optimizing fuzzy string matching operations.

3

4

## Capabilities

5

6

### String Processing and Normalization

7

8

Core string preprocessing function that normalizes strings for consistent fuzzy matching.

9

10

```python { .api }

11

def full_process(s: str, force_ascii: bool = False) -> str:

12

"""

13

Process string by removing non-alphanumeric characters, trimming, and lowercasing.

14

15

Processing steps:

16

1. Convert to ASCII if force_ascii=True

17

2. Replace non-letters and non-numbers with whitespace

18

3. Convert to lowercase

19

4. Strip leading and trailing whitespace

20

21

Parameters:

22

s: String to process

23

force_ascii: Force conversion to ASCII, removing non-ASCII characters

24

25

Returns:

26

str: Processed string ready for fuzzy matching

27

"""

28

```

29

30

**Usage Example:**

31

```python

32

from fuzzywuzzy import utils

33

34

# Standard processing

35

processed = utils.full_process(" Hello, World! 123 ")

36

print(processed) # "hello world 123"

37

38

# With ASCII forcing

39

processed = utils.full_process("Café naïve résumé", force_ascii=True)

40

print(processed) # "caf naive resume"

41

42

# Remove punctuation and normalize

43

processed = utils.full_process("user@example.com")

44

print(processed) # "user example com"

45

```

46

47

### String Validation

48

49

Validate that strings are suitable for fuzzy matching operations.

50

51

```python { .api }

52

def validate_string(s) -> bool:

53

"""

54

Check input has length and that length > 0.

55

56

Parameters:

57

s: Input to validate

58

59

Returns:

60

bool: True if len(s) > 0, False otherwise or if TypeError

61

"""

62

```

63

64

**Usage Example:**

65

```python

66

from fuzzywuzzy import utils

67

68

print(utils.validate_string("hello")) # True

69

print(utils.validate_string("")) # False

70

print(utils.validate_string(None)) # False

71

print(utils.validate_string(123)) # False (TypeError)

72

```

73

74

### Type Consistency

75

76

Ensure both strings are the same type (str or unicode) for consistent comparison.

77

78

```python { .api }

79

def make_type_consistent(s1, s2):

80

"""

81

If both objects aren't either both string or unicode instances, force them to unicode.

82

83

Parameters:

84

s1: First string

85

s2: Second string

86

87

Returns:

88

tuple: (s1, s2) with consistent types

89

"""

90

```

91

92

### ASCII Conversion Functions

93

94

Functions for handling ASCII conversion and character filtering.

95

96

```python { .api }

97

def asciidammit(s) -> str:

98

"""

99

Force string to ASCII by removing or converting non-ASCII characters.

100

101

Parameters:

102

s: String to convert

103

104

Returns:

105

str: ASCII-only version of input string

106

"""

107

108

def asciionly(s) -> str:

109

"""

110

Remove non-ASCII characters from string.

111

112

Parameters:

113

s: String to filter

114

115

Returns:

116

str: String with non-ASCII characters removed

117

"""

118

```

119

120

**Usage Example:**

121

```python

122

from fuzzywuzzy import utils

123

124

# Force ASCII conversion

125

ascii_str = utils.asciidammit("Café naïve résumé")

126

print(ascii_str) # "Caf naive resume"

127

128

# Remove non-ASCII only

129

filtered = utils.asciionly("Hello 世界")

130

print(filtered) # "Hello "

131

```

132

133

### Mathematical Utilities

134

135

Helper functions for numerical operations in fuzzy matching.

136

137

```python { .api }

138

def intr(n) -> int:

139

"""

140

Return a correctly rounded integer.

141

142

Parameters:

143

n: Number to round

144

145

Returns:

146

int: Rounded integer value

147

"""

148

```

149

150

**Usage Example:**

151

```python

152

from fuzzywuzzy import utils

153

154

print(utils.intr(97.6)) # 98

155

print(utils.intr(97.4)) # 97

156

print(utils.intr(97.5)) # 98

157

```

158

159

## StringProcessor Class

160

161

Advanced string processing utilities with optimized methods.

162

163

```python { .api }

164

class StringProcessor:

165

"""

166

String processing utilities class with efficient methods for

167

text normalization and cleaning operations.

168

"""

169

170

@classmethod

171

def replace_non_letters_non_numbers_with_whitespace(cls, a_string: str) -> str:

172

"""

173

Replace any sequence of non-letters and non-numbers with single whitespace.

174

175

Parameters:

176

a_string: String to process

177

178

Returns:

179

str: String with non-alphanumeric sequences replaced by spaces

180

"""

181

182

@staticmethod

183

def strip(s: str) -> str:

184

"""Remove leading and trailing whitespace."""

185

186

@staticmethod

187

def to_lower_case(s: str) -> str:

188

"""Convert string to lowercase."""

189

190

@staticmethod

191

def to_upper_case(s: str) -> str:

192

"""Convert string to uppercase."""

193

```

194

195

**Usage Example:**

196

```python

197

from fuzzywuzzy.string_processing import StringProcessor

198

199

# Advanced string processing

200

text = "Hello!!! @#$ World??? 123"

201

processed = StringProcessor.replace_non_letters_non_numbers_with_whitespace(text)

202

print(processed) # "Hello World 123"

203

204

# Standard operations

205

lower_text = StringProcessor.to_lower_case("HELLO WORLD")

206

print(lower_text) # "hello world"

207

208

stripped = StringProcessor.strip(" hello world ")

209

print(stripped) # "hello world"

210

```

211

212

## StringMatcher Class (Optional)

213

214

High-performance string matching class available when python-Levenshtein is installed.

215

216

```python { .api }

217

class StringMatcher:

218

"""

219

A SequenceMatcher-like class built on top of Levenshtein distance calculations.

220

Provides significant performance improvements when python-Levenshtein is available.

221

222

This class provides a SequenceMatcher-compatible interface while using the

223

highly optimized Levenshtein library for calculations.

224

"""

225

226

def __init__(self, isjunk=None, seq1: str = '', seq2: str = ''):

227

"""

228

Initialize StringMatcher with two sequences.

229

230

Parameters:

231

isjunk: Junk function (ignored, not implemented - will show warning)

232

seq1: First string to compare (default: '')

233

seq2: Second string to compare (default: '')

234

"""

235

236

def set_seqs(self, seq1: str, seq2: str):

237

"""

238

Set both sequences for comparison and reset cache.

239

240

Parameters:

241

seq1: First string to compare

242

seq2: Second string to compare

243

"""

244

245

def set_seq1(self, seq1: str):

246

"""

247

Set first sequence and reset cache.

248

249

Parameters:

250

seq1: First string to compare

251

"""

252

253

def set_seq2(self, seq2: str):

254

"""

255

Set second sequence and reset cache.

256

257

Parameters:

258

seq2: Second string to compare

259

"""

260

261

def ratio(self) -> float:

262

"""

263

Get similarity ratio between sequences using Levenshtein calculation.

264

265

Returns:

266

float: Similarity ratio between 0.0 and 1.0

267

"""

268

269

def quick_ratio(self) -> float:

270

"""

271

Get quick similarity ratio (same as ratio() in this implementation).

272

273

Returns:

274

float: Similarity ratio between 0.0 and 1.0

275

"""

276

277

def real_quick_ratio(self) -> float:

278

"""

279

Get a very quick similarity estimate based on string lengths.

280

281

Returns:

282

float: Quick similarity estimate between 0.0 and 1.0

283

"""

284

285

def distance(self) -> int:

286

"""

287

Get Levenshtein distance between sequences.

288

289

Returns:

290

int: Edit distance (number of operations to transform seq1 to seq2)

291

"""

292

293

def get_opcodes(self):

294

"""

295

Get operation codes for sequence comparison.

296

297

Returns:

298

List of operation codes compatible with difflib.SequenceMatcher

299

"""

300

301

def get_editops(self):

302

"""

303

Get edit operations for transforming one sequence to another.

304

305

Returns:

306

List of edit operations (insertions, deletions, substitutions)

307

"""

308

309

def get_matching_blocks(self):

310

"""

311

Get matching blocks between sequences.

312

313

Returns:

314

List of matching blocks compatible with difflib.SequenceMatcher

315

"""

316

```

317

318

**Usage Example:**

319

```python

320

# Only available if python-Levenshtein is installed

321

try:

322

from fuzzywuzzy.StringMatcher import StringMatcher

323

324

matcher = StringMatcher(seq1="hello world", seq2="hallo world")

325

ratio = matcher.ratio()

326

print(f"Similarity: {ratio}") # High-performance ratio calculation

327

328

distance = matcher.distance()

329

print(f"Edit distance: {distance}") # Levenshtein distance

330

331

except ImportError:

332

print("python-Levenshtein not installed, using standard algorithms")

333

```

334

335

## Constants and Configuration

336

337

Internal constants used by fuzzywuzzy for compatibility and character handling.

338

339

```python { .api }

340

PY3: bool # True if running Python 3, False for Python 2

341

bad_chars: str # String containing ASCII characters 128-256 for filtering

342

translation_table: dict # Translation table for removing non-ASCII chars (Python 3 only)

343

unicode: type # str type in Python 3, unicode type in Python 2

344

```

345

346

**Usage Example:**

347

```python

348

from fuzzywuzzy import utils

349

350

# Check Python version

351

if utils.PY3:

352

print("Running on Python 3")

353

else:

354

print("Running on Python 2")

355

356

# Access character filtering components

357

print(f"Bad chars string length: {len(utils.bad_chars)}") # 128 characters

358

```

359

360

## Decorator Functions (Internal)

361

362

These decorators are used internally by fuzzywuzzy but can be useful for custom scoring functions. They handle common edge cases in string comparison.

363

364

```python { .api }

365

def check_for_equivalence(func):

366

"""

367

Decorator that returns 100 if both input strings are identical.

368

369

This decorator checks if args[0] == args[1] and returns 100 (perfect match)

370

if they are equal, otherwise calls the decorated function.

371

372

Parameters:

373

func: Function to decorate that takes two string arguments

374

375

Returns:

376

function: Decorated function that handles string equivalence

377

"""

378

379

def check_for_none(func):

380

"""

381

Decorator that returns 0 if either input string is None.

382

383

This decorator checks if args[0] or args[1] is None and returns 0

384

(no match) if either is None, otherwise calls the decorated function.

385

386

Parameters:

387

func: Function to decorate that takes two string arguments

388

389

Returns:

390

function: Decorated function that handles None inputs

391

"""

392

393

def check_empty_string(func):

394

"""

395

Decorator that returns 0 if either input string is empty.

396

397

This decorator checks if len(args[0]) == 0 or len(args[1]) == 0 and

398

returns 0 (no match) if either is empty, otherwise calls the decorated function.

399

400

Parameters:

401

func: Function to decorate that takes two string arguments

402

403

Returns:

404

function: Decorated function that handles empty string inputs

405

"""

406

```

407

408

**Usage Example:**

409

```python

410

from fuzzywuzzy import utils

411

412

@utils.check_for_none

413

@utils.check_for_equivalence

414

def custom_scorer(s1, s2):

415

# Your custom scoring logic here

416

return 50 # Example score

417

418

# Decorators handle edge cases automatically

419

print(custom_scorer("hello", "hello")) # 100 (equivalence)

420

print(custom_scorer("hello", None)) # 0 (none check)

421

print(custom_scorer("hello", "world")) # 50 (custom logic)

422

```