or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-ftfy

Fixes mojibake and other problems with Unicode, after the fact

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/ftfy@6.3.x

To install, run

npx @tessl/cli install tessl/pypi-ftfy@6.3.0

0

# ftfy

1

2

Fixes mojibake and other problems with Unicode text after the fact. Detects and corrects common encoding issues, normalizes character formatting, and provides robust text cleaning utilities for handling text from unreliable sources with mixed or unknown encodings.

3

4

## Package Information

5

6

- **Package Name**: ftfy

7

- **Language**: Python

8

- **Installation**: `pip install ftfy`

9

10

## Core Imports

11

12

```python

13

import ftfy

14

```

15

16

Common import patterns:

17

18

```python

19

from ftfy import fix_text, fix_and_explain, TextFixerConfig

20

```

21

22

For individual text fixers:

23

24

```python

25

from ftfy.fixes import unescape_html, remove_terminal_escapes, uncurl_quotes

26

```

27

28

For formatting utilities:

29

30

```python

31

from ftfy.formatting import display_ljust, character_width

32

```

33

34

## Basic Usage

35

36

```python

37

import ftfy

38

39

# Fix encoding problems (mojibake)

40

broken_text = "âœ" No problems"

41

fixed = ftfy.fix_text(broken_text)

42

print(fixed) # "✔ No problems"

43

44

# Fix multiple layers of mojibake

45

broken = "The Mona Lisa doesn’t have eyebrows."

46

fixed = ftfy.fix_text(broken)

47

print(fixed) # "The Mona Lisa doesn't have eyebrows."

48

49

# Get explanation of what was fixed

50

text, explanation = ftfy.fix_and_explain("só")

51

print(text) # "só"

52

print(explanation) # [('encode', 'latin-1'), ('decode', 'utf-8')]

53

54

# Configure specific fixes

55

from ftfy import TextFixerConfig

56

config = TextFixerConfig(uncurl_quotes=False)

57

result = ftfy.fix_text(text, config)

58

```

59

60

## Architecture

61

62

ftfy operates through a multi-step pipeline that detects and corrects text problems:

63

64

- **Heuristic Detection**: Uses statistical analysis to identify mojibake patterns without false positives

65

- **Encoding Analysis**: Systematically tests encoding combinations to reverse encoding errors

66

- **Character Normalization**: Applies format fixes for quotes, ligatures, width, and line breaks

67

- **Configurable Pipeline**: Each fix step can be individually enabled/disabled via TextFixerConfig

68

- **Explanation System**: Provides detailed transformation logs for debugging and understanding

69

70

This design enables ftfy to safely process text from unknown sources while avoiding overcorrection of correctly-encoded text.

71

72

## Capabilities

73

74

### Text Fixing Functions

75

76

Core functions for detecting and fixing text encoding problems, including the main fix_text function and variants that provide explanations of applied transformations.

77

78

```python { .api }

79

def fix_text(text: str, config: TextFixerConfig | None = None, **kwargs) -> str: ...

80

def fix_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText: ...

81

def fix_encoding(text: str, config: TextFixerConfig | None = None, **kwargs) -> str: ...

82

def fix_encoding_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText: ...

83

84

# Alias for fix_text

85

ftfy = fix_text

86

```

87

88

[Text Fixing Functions](./text-fixing.md)

89

90

### Configuration and Types

91

92

Configuration classes and types for controlling ftfy behavior, including comprehensive options for each fix step and explanation data structures.

93

94

```python { .api }

95

class TextFixerConfig(NamedTuple): ...

96

class ExplainedText(NamedTuple): ...

97

class ExplanationStep(NamedTuple): ...

98

```

99

100

[Configuration and Types](./configuration.md)

101

102

### Individual Text Fixes

103

104

Individual transformation functions for specific text problems like HTML entities, terminal escapes, character width, quotes, and line breaks.

105

106

```python { .api }

107

def unescape_html(text: str) -> str: ...

108

def remove_terminal_escapes(text: str) -> str: ...

109

def uncurl_quotes(text: str) -> str: ...

110

def fix_character_width(text: str) -> str: ...

111

def fix_line_breaks(text: str) -> str: ...

112

```

113

114

[Individual Text Fixes](./individual-fixes.md)

115

116

### File and Byte Processing

117

118

Functions for processing files and handling bytes of unknown encoding, including streaming file processing and encoding detection utilities.

119

120

```python { .api }

121

def fix_file(input_file, encoding: str | None = None, config: TextFixerConfig | None = None, **kwargs) -> Iterator[str]: ...

122

def guess_bytes(bstring: bytes) -> tuple[str, str]: ...

123

```

124

125

[File and Byte Processing](./file-processing.md)

126

127

### Display and Formatting

128

129

Unicode-aware text formatting for terminal display, including width calculation and justification functions that handle fullwidth characters and zero-width characters correctly.

130

131

```python { .api }

132

def character_width(char: str) -> int: ...

133

def display_ljust(text: str, width: int, fillchar: str = " ") -> str: ...

134

def display_center(text: str, width: int, fillchar: str = " ") -> str: ...

135

```

136

137

[Display and Formatting](./formatting.md)

138

139

### Utilities and Debugging

140

141

Debugging and utility functions for understanding Unicode text and applying transformation plans manually.

142

143

```python { .api }

144

def explain_unicode(text: str) -> None: ...

145

def apply_plan(text: str, plan: list[tuple[str, str]]) -> str: ...

146

def badness(text: str) -> int: ...

147

def is_bad(text: str) -> bool: ...

148

```

149

150

[Utilities and Debugging](./utilities.md)

151

152

### Command Line Interface

153

154

Command-line tool for batch text processing with configurable options for encoding, normalization, and entity handling.

155

156

```python { .api }

157

def main() -> None: ...

158

```

159

160

[Command Line Interface](./cli.md)

161

162

## Constants

163

164

```python { .api }

165

__version__ = "6.3.1" # Package version string

166

```

167

168

## Core Types

169

170

```python { .api }

171

class TextFixerConfig(NamedTuple):

172

"""Configuration for all ftfy text processing options."""

173

unescape_html: str | bool = "auto"

174

remove_terminal_escapes: bool = True

175

fix_encoding: bool = True

176

restore_byte_a0: bool = True

177

replace_lossy_sequences: bool = True

178

decode_inconsistent_utf8: bool = True

179

fix_c1_controls: bool = True

180

fix_latin_ligatures: bool = True

181

fix_character_width: bool = True

182

uncurl_quotes: bool = True

183

fix_line_breaks: bool = True

184

fix_surrogates: bool = True

185

remove_control_chars: bool = True

186

normalization: Literal["NFC", "NFD", "NFKC", "NFKD"] | None = "NFC"

187

max_decode_length: int = 1000000

188

explain: bool = True

189

190

class ExplainedText(NamedTuple):

191

"""Result containing fixed text and explanation of changes."""

192

text: str

193

explanation: list[ExplanationStep] | None

194

195

class ExplanationStep(NamedTuple):

196

"""Single step in text transformation explanation."""

197

action: str # "encode", "decode", "transcode", "apply", "normalize"

198

parameter: str # encoding name or function name

199

```