or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdconfiguration.mdfile-processing.mdformatting.mdindex.mdindividual-fixes.mdtext-fixing.mdutilities.md

text-fixing.mddocs/

0

# Text Fixing Functions

1

2

Core functions for detecting and fixing text encoding problems, including the main fix_text function and variants that provide explanations of applied transformations.

3

4

## Capabilities

5

6

### Main Text Fixing

7

8

Detects and fixes Unicode text problems including mojibake, HTML entities, character formatting issues, and other common text corruptions.

9

10

```python { .api }

11

def fix_text(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:

12

"""

13

Fix inconsistencies and glitches in Unicode text.

14

15

Applies multiple text fixes in sequence, processing text in segments

16

for performance. Handles mojibake, HTML entities, character width,

17

quotes, line breaks, and other common text problems.

18

19

Args:

20

text: Unicode string to fix

21

config: Configuration object, or None for defaults

22

**kwargs: Individual config options (e.g., uncurl_quotes=False)

23

24

Returns:

25

Fixed Unicode string

26

27

Examples:

28

>>> fix_text('âœ" No problems')

29

'✔ No problems'

30

>>> fix_text('LOUD NOISES')

31

'LOUD NOISES'

32

"""

33

```

34

35

### Text Fixing with Explanation

36

37

Fixes text and provides detailed explanation of transformations applied, useful for debugging and understanding the fixes.

38

39

```python { .api }

40

def fix_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText:

41

"""

42

Fix text as single segment and return explanation of changes.

43

44

Processes text with consistent sequence of fixes and returns both

45

the fixed text and list of transformation steps applied.

46

47

Args:

48

text: Unicode string to fix

49

config: Configuration object, or None for defaults

50

**kwargs: Individual config options

51

52

Returns:

53

ExplainedText with fixed text and explanation steps

54

55

Examples:

56

>>> result = fix_and_explain("só")

57

>>> result.text

58

'só'

59

>>> result.explanation

60

[ExplanationStep(action='encode', parameter='latin-1'),

61

ExplanationStep(action='decode', parameter='utf-8')]

62

"""

63

```

64

65

### Encoding-Only Fixing

66

67

Applies only the encoding detection and correction steps, skipping character formatting and normalization fixes.

68

69

```python { .api }

70

def fix_encoding(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:

71

"""

72

Apply only encoding-fixing steps of ftfy.

73

74

Detects mojibake and attempts to fix by decoding text in different

75

encoding standard, without applying character formatting fixes.

76

77

Args:

78

text: Unicode string to fix

79

config: Configuration object, or None for defaults

80

**kwargs: Individual config options

81

82

Returns:

83

Text with encoding problems fixed

84

85

Examples:

86

>>> fix_encoding("ó")

87

'ó'

88

>>> fix_encoding("ó") # HTML entities not fixed

89

'ó'

90

"""

91

92

def fix_encoding_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText:

93

"""

94

Apply encoding fixes and return explanation.

95

96

Detects and fixes mojibake with detailed explanation of encoding

97

transformations applied including subordinate fixes.

98

99

Args:

100

text: Unicode string to fix

101

config: Configuration object, or None for defaults

102

**kwargs: Individual config options

103

104

Returns:

105

ExplainedText with encoding fixes and explanation

106

107

Examples:

108

>>> result = fix_encoding_and_explain("voilà le travail")

109

>>> result.text

110

'voilà le travail'

111

>>> result.explanation

112

[ExplanationStep(action='encode', parameter='latin-1'),

113

ExplanationStep(action='transcode', parameter='restore_byte_a0'),

114

ExplanationStep(action='decode', parameter='utf-8')]

115

"""

116

```

117

118

### Single Segment Processing

119

120

Fixes text as single segment with consistent transformation sequence, useful when segment boundaries matter.

121

122

```python { .api }

123

def fix_text_segment(text: str, config: TextFixerConfig | None = None, **kwargs) -> str:

124

"""

125

Fix text as single segment with consistent sequence of steps.

126

127

Unlike fix_text which may process in multiple segments, this applies

128

a single consistent sequence of transformations to entire text.

129

130

Args:

131

text: Unicode string to fix

132

config: Configuration object, or None for defaults

133

**kwargs: Individual config options

134

135

Returns:

136

Fixed text processed as single segment

137

"""

138

```

139

140

## Usage Examples

141

142

### Basic Text Fixing

143

144

```python

145

import ftfy

146

147

# Fix common mojibake

148

broken = "âœ" No problems"

149

fixed = ftfy.fix_text(broken)

150

print(fixed) # "✔ No problems"

151

152

# Fix multiple encoding layers

153

multilayer = "The Mona Lisa doesn’t have eyebrows."

154

fixed = ftfy.fix_text(multilayer)

155

print(fixed) # "The Mona Lisa doesn't have eyebrows."

156

```

157

158

### Configuration Options

159

160

```python

161

from ftfy import fix_text, TextFixerConfig

162

163

# Disable quote uncurling

164

config = TextFixerConfig(uncurl_quotes=False)

165

text_with_quotes = "It's "quoted" text"

166

result = fix_text(text_with_quotes, config)

167

168

# Use keyword arguments

169

result = fix_text(text_with_quotes, uncurl_quotes=False)

170

171

# Disable HTML entity decoding

172

result = fix_text("& symbols", unescape_html=False)

173

```

174

175

### Getting Explanations

176

177

```python

178

from ftfy import fix_and_explain

179

180

# Understand what was fixed

181

text, explanation = fix_and_explain("áéíóú")

182

print(f"Fixed: {text}")

183

print(f"Steps: {explanation}")

184

185

# Check if any fixes were applied

186

result = fix_and_explain("normal text")

187

if result.explanation:

188

print("Fixes applied:", result.explanation)

189

else:

190

print("No fixes needed")

191

```

192

193

### Encoding-Only Processing

194

195

```python

196

from ftfy import fix_encoding, fix_encoding_and_explain

197

198

# Fix only encoding problems

199

mojibake = "café" # appears as mojibake

200

fixed = fix_encoding(mojibake)

201

202

# Get encoding fix explanation

203

result = fix_encoding_and_explain(mojibake)

204

print(f"Encoding steps: {result.explanation}")

205

```