0
# Text Fixing Functions
1
2
Core functions for detecting and fixing text encoding problems, including the main fix_text function and variants that provide explanations of applied transformations.
3
4
## Capabilities
5
6
### Main Text Fixing
7
8
Detects and fixes Unicode text problems including mojibake, HTML entities, character formatting issues, and other common text corruptions.
9
10
```python { .api }
11
def fix_text(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:
12
"""
13
Fix inconsistencies and glitches in Unicode text.
14
15
Applies multiple text fixes in sequence, processing text in segments
16
for performance. Handles mojibake, HTML entities, character width,
17
quotes, line breaks, and other common text problems.
18
19
Args:
20
text: Unicode string to fix
21
config: Configuration object, or None for defaults
22
**kwargs: Individual config options (e.g., uncurl_quotes=False)
23
24
Returns:
25
Fixed Unicode string
26
27
Examples:
28
>>> fix_text('âœ" No problems')
29
'✔ No problems'
30
>>> fix_text('LOUD NOISES')
31
'LOUD NOISES'
32
"""
33
```
34
35
### Text Fixing with Explanation
36
37
Fixes text and provides detailed explanation of transformations applied, useful for debugging and understanding the fixes.
38
39
```python { .api }
40
def fix_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText:
41
"""
42
Fix text as single segment and return explanation of changes.
43
44
Processes text with consistent sequence of fixes and returns both
45
the fixed text and list of transformation steps applied.
46
47
Args:
48
text: Unicode string to fix
49
config: Configuration object, or None for defaults
50
**kwargs: Individual config options
51
52
Returns:
53
ExplainedText with fixed text and explanation steps
54
55
Examples:
56
>>> result = fix_and_explain("só")
57
>>> result.text
58
'só'
59
>>> result.explanation
60
[ExplanationStep(action='encode', parameter='latin-1'),
61
ExplanationStep(action='decode', parameter='utf-8')]
62
"""
63
```
64
65
### Encoding-Only Fixing
66
67
Applies only the encoding detection and correction steps, skipping character formatting and normalization fixes.
68
69
```python { .api }
70
def fix_encoding(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:
71
"""
72
Apply only encoding-fixing steps of ftfy.
73
74
Detects mojibake and attempts to fix by decoding text in different
75
encoding standard, without applying character formatting fixes.
76
77
Args:
78
text: Unicode string to fix
79
config: Configuration object, or None for defaults
80
**kwargs: Individual config options
81
82
Returns:
83
Text with encoding problems fixed
84
85
Examples:
86
>>> fix_encoding("ó")
87
'ó'
88
>>> fix_encoding("ó") # HTML entities not fixed
89
'ó'
90
"""
91
92
def fix_encoding_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText:
93
"""
94
Apply encoding fixes and return explanation.
95
96
Detects and fixes mojibake with detailed explanation of encoding
97
transformations applied including subordinate fixes.
98
99
Args:
100
text: Unicode string to fix
101
config: Configuration object, or None for defaults
102
**kwargs: Individual config options
103
104
Returns:
105
ExplainedText with encoding fixes and explanation
106
107
Examples:
108
>>> result = fix_encoding_and_explain("voilà le travail")
109
>>> result.text
110
'voilà le travail'
111
>>> result.explanation
112
[ExplanationStep(action='encode', parameter='latin-1'),
113
ExplanationStep(action='transcode', parameter='restore_byte_a0'),
114
ExplanationStep(action='decode', parameter='utf-8')]
115
"""
116
```
117
118
### Single Segment Processing
119
120
Fixes text as single segment with consistent transformation sequence, useful when segment boundaries matter.
121
122
```python { .api }
123
def fix_text_segment(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:
124
"""
125
Fix text as single segment with consistent sequence of steps.
126
127
Unlike fix_text which may process in multiple segments, this applies
128
a single consistent sequence of transformations to entire text.
129
130
Args:
131
text: Unicode string to fix
132
config: Configuration object, or None for defaults
133
**kwargs: Individual config options
134
135
Returns:
136
Fixed text processed as single segment
137
"""
138
```
139
140
## Usage Examples
141
142
### Basic Text Fixing
143
144
```python
145
import ftfy
146
147
# Fix common mojibake
148
broken = "âœ" No problems"
149
fixed = ftfy.fix_text(broken)
150
print(fixed) # "✔ No problems"
151
152
# Fix multiple encoding layers
153
multilayer = "The Mona Lisa doesn’t have eyebrows."
154
fixed = ftfy.fix_text(multilayer)
155
print(fixed) # "The Mona Lisa doesn't have eyebrows."
156
```
157
158
### Configuration Options
159
160
```python
161
from ftfy import fix_text, TextFixerConfig
162
163
# Disable quote uncurling
164
config = TextFixerConfig(uncurl_quotes=False)
165
text_with_quotes = "It's "quoted" text"
166
result = fix_text(text_with_quotes, config)
167
168
# Use keyword arguments
169
result = fix_text(text_with_quotes, uncurl_quotes=False)
170
171
# Disable HTML entity decoding
172
result = fix_text("& symbols", unescape_html=False)
173
```
174
175
### Getting Explanations
176
177
```python
178
from ftfy import fix_and_explain
179
180
# Understand what was fixed
181
text, explanation = fix_and_explain("áéÃóú")
182
print(f"Fixed: {text}")
183
print(f"Steps: {explanation}")
184
185
# Check if any fixes were applied
186
result = fix_and_explain("normal text")
187
if result.explanation:
188
print("Fixes applied:", result.explanation)
189
else:
190
print("No fixes needed")
191
```
192
193
### Encoding-Only Processing
194
195
```python
196
from ftfy import fix_encoding, fix_encoding_and_explain
197
198
# Fix only encoding problems
199
mojibake = "café" # appears as mojibake
200
fixed = fix_encoding(mojibake)
201
202
# Get encoding fix explanation
203
result = fix_encoding_and_explain(mojibake)
204
print(f"Encoding steps: {result.explanation}")
205
```