Tessl Tile for pypi/thefuzz@0.22.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md string-processing.md string-similarity.md string-utilities.md

string-utilities.mddocs/

0
# String Utilities
1

2
Utility functions for string preprocessing and normalization. These functions prepare strings for fuzzy matching by cleaning and standardizing their format.
3

4
## Capabilities
5

6
### Full String Processing
7

8
Comprehensive string preprocessing that normalizes text for optimal fuzzy matching performance.
9

10
```python { .api }
11
def full_process(s: str, force_ascii: bool = False) -> str:
12
    """
13
    Process string for fuzzy matching by normalizing format.
14
    
15
    Processing steps:
16
    1. Convert to string if not already
17
    2. Optionally convert to ASCII (removes accented characters)
18
    3. Remove all non-alphanumeric characters (replaced with spaces)
19
    4. Trim leading/trailing whitespace
20
    5. Convert to lowercase
21
    6. Normalize internal whitespace
22
    
23
    Args:
24
        s: String to process
25
        force_ascii: If True, convert accented characters to ASCII equivalents
26
        
27
    Returns:
28
        str: Processed and normalized string
29
    """
30
```
31

32
### ASCII Conversion
33

34
Convert strings to ASCII-only by removing non-ASCII characters, useful for standardizing international text.
35

36
```python { .api }
37
def ascii_only(s: str) -> str:
38
    """
39
    Convert string to ASCII by removing non-ASCII characters.
40
    
41
    Removes characters with ASCII codes 128-255, effectively stripping
42
    accented characters, emoji, and other non-ASCII content.
43
    
44
    Args:
45
        s: String to convert
46
        
47
    Returns:
48
        str: ASCII-only version of the string
49
    """
50
```
51

52
### Module Constants
53

54
```python { .api }
55
# Translation table for ASCII conversion (removes chars 128-255)
56
translation_table: dict
57
```
58

59
## Usage Examples
60

61
### Basic String Processing
62

63
```python
64
from thefuzz import utils
65

66
# Standard text normalization
67
text = "  Hello, World!  "
68
processed = utils.full_process(text)
69
print(processed)  # "hello world"
70

71
# Handle special characters
72
text = "New York Mets vs. Atlanta Braves"
73
processed = utils.full_process(text)  
74
print(processed)  # "new york mets vs atlanta braves"
75
```
76

77
### ASCII Conversion
78

79
```python
80
from thefuzz import utils
81

82
# Convert accented characters
83
text = "Café Münchën"
84
ascii_text = utils.ascii_only(text)
85
print(ascii_text)  # "Caf Mnchen"
86

87
# Full processing with ASCII conversion
88
processed = utils.full_process("Café Münchën", force_ascii=True)
89
print(processed)  # "caf mnchen"
90
```
91

92
### Integration with Fuzzy Matching
93

94
```python
95
from thefuzz import fuzz, utils
96

97
# Manual preprocessing before comparison
98
s1 = utils.full_process("New York Mets!")
99
s2 = utils.full_process("new york mets")
100
score = fuzz.ratio(s1, s2)
101
print(score)  # 100 (perfect match after processing)
102

103
# Compare with and without processing
104
raw_score = fuzz.ratio("New York Mets!", "new york mets")
105
processed_score = fuzz.ratio(
106
    utils.full_process("New York Mets!"),
107
    utils.full_process("new york mets")
108
)
109
print(f"Raw: {raw_score}, Processed: {processed_score}")
110
```
111

112
### Custom Processing Pipeline
113

114
```python
115
from thefuzz import utils
116

117
def custom_processor(text):
118
    """Custom processing for specific use case."""
119
    # First apply standard processing
120
    processed = utils.full_process(text, force_ascii=True)
121
    
122
    # Add custom logic
123
    # Remove common stop words, normalize abbreviations, etc.
124
    replacements = {
125
        "street": "st",
126
        "avenue": "ave", 
127
        "boulevard": "blvd"
128
    }
129
    
130
    for old, new in replacements.items():
131
        processed = processed.replace(old, new)
132
    
133
    return processed
134

135
# Use with fuzzy matching
136
from thefuzz import process
137

138
addresses = ["123 Main Street", "456 Oak Avenue", "789 First Boulevard"]
139
result = process.extractOne("main st", addresses, processor=custom_processor)
140
```
141

142
### Performance Considerations
143

144
```python
145
from thefuzz import utils
146

147
# For batch processing, consider preprocessing once
148
texts = ["Text 1", "Text 2", "Text 3", ...]
149
processed_texts = [utils.full_process(text) for text in texts]
150

151
# Then use the processed texts for multiple comparisons
152
# This avoids repeated preprocessing in fuzzy matching functions
153
```
154

155
## Processing Behavior
156

157
### Character Handling
158

159
- **Alphanumeric**: Preserved (letters and numbers)
160
- **Whitespace**: Normalized (multiple spaces become single space, trimmed)
161
- **Punctuation**: Removed (replaced with spaces)
162
- **Accented characters**: Optionally converted to ASCII equivalents
163
- **Case**: Converted to lowercase
164

165
### Examples of Processing Results
166

167
```python
168
from thefuzz import utils
169

170
examples = [
171
    "Hello, World!",           # → "hello world"
172
    "  Multiple   Spaces  ",   # → "multiple spaces"
173
    "New York Mets vs. ATL",   # → "new york mets vs atl"
174
    "Café Münchën",           # → "café münchën" (or "caf mnchen" with force_ascii=True)
175
    "user@email.com",          # → "user email com"
176
    "1st & 2nd Avenue",        # → "1st 2nd avenue"
177
]
178

179
for text in examples:
180
    processed = utils.full_process(text)
181
    processed_ascii = utils.full_process(text, force_ascii=True)
182
    print(f"'{text}' → '{processed}' → '{processed_ascii}'")
183
```

Version

Tile

Files

string-utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

string-utilities.mddocs/