Tessl Tile for pypi/ftfy@6.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli.md configuration.md file-processing.md formatting.md index.md individual-fixes.md text-fixing.md utilities.md

utilities.mddocs/

0
# Utilities and Debugging
1

2
Debugging and utility functions for understanding Unicode text and applying transformation plans manually.
3

4
## Capabilities  
5

6
### Unicode Text Analysis
7

8
Debugging utility for analyzing Unicode text character by character.
9

10
```python { .api }
11
def explain_unicode(text: str) -> None:
12
    """
13
    Debug utility showing detailed Unicode information for each character.
14
    
15
    Prints character-by-character breakdown showing Unicode codepoint,
16
    glyph, category, and name for debugging mysterious Unicode text.
17
    Output goes to stdout and is intended for interactive debugging.
18
    
19
    Args:
20
        text: Unicode string to analyze
21
        
22
    Returns:
23
        None (prints to stdout)
24
        
25
    Examples:
26
        >>> explain_unicode('café')
27
        U+0063  c       [Ll] LATIN SMALL LETTER C
28
        U+0061  a       [Ll] LATIN SMALL LETTER A  
29
        U+0066  f       [Ll] LATIN SMALL LETTER F
30
        U+00E9  é       [Ll] LATIN SMALL LETTER E WITH ACUTE
31
        
32
        >>> explain_unicode('😀🎉')
33
        U+1F600 😀      [So] GRINNING FACE
34
        U+1F389 🎉      [So] PARTY POPPER
35
    """
36
```
37

38
### Transformation Plan Application
39

40
Function for manually applying transformation plans generated by ftfy's explanation system.
41

42
```python { .api }
43
# Dictionary mapping fixer names to functions for use with apply_plan
44
FIXERS: dict[str, Callable] = {
45
    "unescape_html": fixes.unescape_html,
46
    "remove_terminal_escapes": fixes.remove_terminal_escapes,
47
    "restore_byte_a0": fixes.restore_byte_a0,
48
    "replace_lossy_sequences": fixes.replace_lossy_sequences,
49
    "decode_inconsistent_utf8": fixes.decode_inconsistent_utf8,
50
    "fix_c1_controls": fixes.fix_c1_controls,
51
    "fix_latin_ligatures": fixes.fix_latin_ligatures,
52
    "fix_character_width": fixes.fix_character_width,
53
    "uncurl_quotes": fixes.uncurl_quotes,
54
    "fix_line_breaks": fixes.fix_line_breaks,
55
    "fix_surrogates": fixes.fix_surrogates,
56
    "remove_control_chars": fixes.remove_control_chars,
57
}
58

59
def apply_plan(text: str, plan: list[tuple[str, str]]) -> str:
60
    """
61
    Apply sequence of text transformations from explanation plan.
62
    
63
    Takes transformation plan (list of operation/parameter tuples) and
64
    applies each step in sequence. Useful for replaying ftfy fixes or
65
    applying custom transformation sequences.
66
    
67
    Args:
68
        text: Initial text or bytes to transform
69
        plan: List of (operation, parameter) tuples
70
        
71
    Returns:
72
        Final transformed text
73
        
74
    Operations:
75
        "encode": Convert string to bytes using parameter as encoding
76
        "decode": Convert bytes to string using parameter as encoding  
77
        "transcode": Apply bytes→bytes function named in parameter
78
        "apply": Apply string→string function named in parameter
79
        
80
    Examples:
81
        >>> plan = [('encode', 'latin-1'), ('decode', 'utf-8')]
82
        >>> apply_plan('sÃ³', plan)
83
        'só'
84
        
85
        >>> plan = [('apply', 'uncurl_quotes'), ('apply', 'fix_line_breaks')]
86
        >>> apply_plan('"curly quotes"\\r\\n', plan)
87
        '"curly quotes"\\n'
88
    """
89
```
90

91
### Heuristic Text Analysis
92

93
Functions for detecting whether text contains mojibake or other problems.
94

95
```python { .api }
96
def badness(text: str) -> int:
97
    """
98
    Count the number of unlikely character sequences in text.
99
    
100
    Returns numerical badness score by counting mojibake patterns.
101
    Higher scores indicate more likely encoding problems. Score > 0
102
    indicates text likely contains mojibake.
103
    
104
    Args:
105
        text: Unicode string to analyze
106
        
107
    Returns:
108
        Number of unlikely character sequences found
109
        
110
    Examples:
111
        >>> from ftfy.badness import badness
112
        >>> badness("normal text")
113
        0
114
        >>> badness("âœ" broken")  # Multiple mojibake patterns
115
        2
116
    """
117

118
def is_bad(text: str) -> bool:
119
    """
120
    Heuristic detection of likely mojibake in text.
121
    
122
    Uses statistical analysis of Unicode character patterns to detect
123
    text that likely contains encoding problems. Designed to minimize
124
    false positives while catching common mojibake patterns.
125
    
126
    Args:
127
        text: Unicode string to analyze
128
        
129
    Returns:
130
        True if text likely contains mojibake, False otherwise
131
        
132
    Examples:
133
        >>> from ftfy.badness import is_bad
134
        >>> is_bad("normal text")
135
        False
136
        >>> is_bad("âœ" broken")  # Mojibake pattern
137
        True
138
    """
139
```
140

141
## Usage Examples
142

143
### Unicode Text Debugging
144

145
```python
146
from ftfy import explain_unicode
147

148
# Debug mysterious characters
149
mysterious_text = "Weird chars: \u00a0\u200b\u2019"
150
print("Analyzing mysterious text:")
151
explain_unicode(mysterious_text)
152

153
# Debug emoji and special characters
154
emoji_text = "🎉🔥💯"
155
print("\nAnalyzing emoji:")
156
explain_unicode(emoji_text)
157

158
# Debug potential mojibake
159
mojibake = "café"  # This might be mojibake
160
print("\nAnalyzing potential mojibake:")
161
explain_unicode(mojibake)
162
```
163

164
### Transformation Plan Replay
165

166
```python
167
from ftfy import fix_and_explain, apply_plan
168

169
# Get explanation for a fix
170
broken_text = "sÃ³"
171
result = fix_and_explain(broken_text)
172
print(f"Original: {broken_text}")
173
print(f"Fixed: {result.text}")  
174
print(f"Plan: {result.explanation}")
175

176
# Convert ExplanationStep objects to tuples for apply_plan
177
plan_tuples = [(step.action, step.parameter) for step in result.explanation]
178

179
# Replay the same transformation on similar text
180
similar_text = "José"  # Same type of mojibake
181
replayed = apply_plan(similar_text, plan_tuples)
182
print(f"Replayed fix: {similar_text} → {replayed}")
183
```
184

185
### Custom Transformation Sequences
186

187
```python
188
from ftfy import apply_plan
189
from ftfy.fixes import FIXERS
190

191
# Check available transformations
192
print("Available fixers:", list(FIXERS.keys()))
193

194
# Build custom transformation plan
195
custom_plan = [
196
    ('apply', 'remove_terminal_escapes'),
197
    ('apply', 'unescape_html'),  
198
    ('apply', 'uncurl_quotes'),
199
    ('apply', 'fix_character_width')
200
]
201

202
# Apply custom sequence
203
messy_text = '\x1b[31m&lt;"curly"&gt;\x1b[0m ＷＩＤＥ'
204
cleaned = apply_plan(messy_text, custom_plan)
205
print(f"Custom clean: {messy_text} → {cleaned}")
206
```
207

208
### Encoding Transformation Plans
209

210
```python
211
from ftfy import apply_plan
212

213
# Manually specify encoding transformations
214
encoding_plan = [
215
    ('encode', 'latin-1'),     # String → bytes as latin-1
216
    ('decode', 'utf-8')        # Bytes → string as utf-8
217
]
218

219
mojibake_texts = ['café', 'naïve', 'résumé']
220
for text in mojibake_texts:
221
    try:
222
        fixed = apply_plan(text, encoding_plan)
223
        print(f"{text} → {fixed}")
224
    except UnicodeError as e:
225
        print(f"{text} → Error: {e}")
226
```
227

228
### Mojibake Detection
229

230
```python  
231
from ftfy.badness import is_bad, badness
232
from ftfy import fix_text
233

234
test_strings = [
235
    "Normal English text",
236
    "Regular café",  
237
    "âœ" mojibake pattern",
238
    "Broken text™ with weird chars",
239
    "Standard Unicode: 你好世界",
240
    "Currency symbols: €£¥",
241
    "sÃ³ definite mojibake"
242
]
243

244
print("Mojibake detection results:")
245
for text in test_strings:
246
    bad = is_bad(text)
247
    score = badness(text)
248
    if bad:
249
        fixed = fix_text(text)  
250
        print(f"😱 BAD (score {score}): '{text}' → '{fixed}'")
251
    else:
252
        print(f"✅ OK (score {score}):  '{text}'")
253
```
254

255
### Debugging Text Processing Pipeline
256

257
```python
258
from ftfy import fix_and_explain, apply_plan, explain_unicode
259
from ftfy.badness import is_bad, badness
260

261
def debug_text_processing(text):
262
    """Comprehensive text debugging pipeline."""
263
    
264
    print(f"=== Debugging: '{text}' ===")
265
    
266
    # Check if text looks problematic
267
    bad_score = badness(text)
268
    print(f"Looks bad: {is_bad(text)} (badness score: {bad_score})")
269
    
270
    # Show character details
271
    print("\nCharacter analysis:")
272
    explain_unicode(text)
273
    
274
    # Try fixing and get explanation
275
    result = fix_and_explain(text)
276
    print(f"\nFixed: '{result.text}'")
277
    
278
    if result.explanation:
279
        print(f"Transformations applied: {len(result.explanation)}")
280
        for i, step in enumerate(result.explanation, 1):
281
            print(f"  {i}. {step.action}: {step.parameter}")
282
            
283
        # Test plan replay
284
        plan_tuples = [(s.action, s.parameter) for s in result.explanation]
285
        replayed = apply_plan(text, plan_tuples)
286
        print(f"Plan replay result: '{replayed}'")
287
        print(f"Replay matches: {replayed == result.text}")
288
    else:
289
        print("No transformations needed")
290
    
291
    print()
292

293
# Debug various problematic texts
294
debug_texts = [
295
    "âœ" Check mark mojibake",
296
    "Normal text",
297
    "sÃ³ encoding issue",
298
    '\x1b[31mTerminal\x1b[0m escapes'
299
]
300

301
for text in debug_texts:
302
    debug_text_processing(text)
303
```
304

305
### Plan Composition and Analysis
306

307
```python
308
from ftfy import apply_plan, fix_and_explain
309

310
def analyze_transformation_effects(text, individual_plans):
311
    """Test individual transformations vs combined effect."""
312
    
313
    print(f"Original: '{text}'")
314
    
315
    # Apply individual transformations
316
    print("\nIndividual transformations:")
317
    current = text
318
    for plan_name, plan in individual_plans.items():
319
        try:
320
            result = apply_plan(current, plan)
321
            if result != current:
322
                print(f"  {plan_name}: '{current}' → '{result}'")
323
                current = result
324
            else:
325
                print(f"  {plan_name}: no change")
326
        except Exception as e:
327
            print(f"  {plan_name}: ERROR {e}")
328
    
329
    print(f"Sequential result: '{current}'")
330
    
331
    # Compare with ftfy's automatic processing
332
    auto_result = fix_and_explain(text)
333
    print(f"ftfy result: '{auto_result.text}'")
334
    print(f"Results match: {current == auto_result.text}")
335

336
# Test transformation composition
337
plans = {
338
    'html_unescape': [('apply', 'unescape_html')],
339
    'terminal_clean': [('apply', 'remove_terminal_escapes')],
340
    'quote_fix': [('apply', 'uncurl_quotes')],
341
    'encoding_fix': [('encode', 'latin-1'), ('decode', 'utf-8')]
342
}
343

344
complex_text = '\x1b[32m&lt;"Problematic"&gt;\x1b[0m text with sÃ³me issues'
345
analyze_transformation_effects(complex_text, plans)
346
```

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/