or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

cli.mdconfiguration.mdfile-processing.mdformatting.mdindex.mdindividual-fixes.mdtext-fixing.mdutilities.md
tile.json

tessl/pypi-ftfy

Fixes mojibake and other problems with Unicode, after the fact

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/ftfy@6.3.x

To install, run

npx @tessl/cli install tessl/pypi-ftfy@6.3.0

index.mddocs/

ftfy

Fixes mojibake and other problems with Unicode text after the fact. Detects and corrects common encoding issues, normalizes character formatting, and provides robust text cleaning utilities for handling text from unreliable sources with mixed or unknown encodings.

Package Information

  • Package Name: ftfy
  • Language: Python
  • Installation: pip install ftfy

Core Imports

import ftfy

Common import patterns:

from ftfy import fix_text, fix_and_explain, TextFixerConfig

For individual text fixers:

from ftfy.fixes import unescape_html, remove_terminal_escapes, uncurl_quotes

For formatting utilities:

from ftfy.formatting import display_ljust, character_width

Basic Usage

import ftfy

# Fix encoding problems (mojibake)
broken_text = "âœ" No problems"
fixed = ftfy.fix_text(broken_text)
print(fixed)  # "✔ No problems"

# Fix multiple layers of mojibake  
broken = "The Mona Lisa doesn’t have eyebrows."
fixed = ftfy.fix_text(broken)
print(fixed)  # "The Mona Lisa doesn't have eyebrows."

# Get explanation of what was fixed
text, explanation = ftfy.fix_and_explain("só")
print(text)        # "só"
print(explanation) # [('encode', 'latin-1'), ('decode', 'utf-8')]

# Configure specific fixes
from ftfy import TextFixerConfig
config = TextFixerConfig(uncurl_quotes=False)
result = ftfy.fix_text(text, config)

Architecture

ftfy operates through a multi-step pipeline that detects and corrects text problems:

  • Heuristic Detection: Uses statistical analysis to identify mojibake patterns without false positives
  • Encoding Analysis: Systematically tests encoding combinations to reverse encoding errors
  • Character Normalization: Applies format fixes for quotes, ligatures, width, and line breaks
  • Configurable Pipeline: Each fix step can be individually enabled/disabled via TextFixerConfig
  • Explanation System: Provides detailed transformation logs for debugging and understanding

This design enables ftfy to safely process text from unknown sources while avoiding overcorrection of correctly-encoded text.

Capabilities

Text Fixing Functions

Core functions for detecting and fixing text encoding problems, including the main fix_text function and variants that provide explanations of applied transformations.

def fix_text(text: str, config: TextFixerConfig | None = None, **kwargs) -> str: ...
def fix_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText: ...
def fix_encoding(text: str, config: TextFixerConfig | None = None, **kwargs) -> str: ...
def fix_encoding_and_explain(text: str, config: TextFixerConfig | None = None, **kwargs) -> ExplainedText: ...

# Alias for fix_text
ftfy = fix_text

Text Fixing Functions

Configuration and Types

Configuration classes and types for controlling ftfy behavior, including comprehensive options for each fix step and explanation data structures.

class TextFixerConfig(NamedTuple): ...
class ExplainedText(NamedTuple): ...
class ExplanationStep(NamedTuple): ...

Configuration and Types

Individual Text Fixes

Individual transformation functions for specific text problems like HTML entities, terminal escapes, character width, quotes, and line breaks.

def unescape_html(text: str) -> str: ...
def remove_terminal_escapes(text: str) -> str: ...
def uncurl_quotes(text: str) -> str: ...
def fix_character_width(text: str) -> str: ...
def fix_line_breaks(text: str) -> str: ...

Individual Text Fixes

File and Byte Processing

Functions for processing files and handling bytes of unknown encoding, including streaming file processing and encoding detection utilities.

def fix_file(input_file, encoding: str | None = None, config: TextFixerConfig | None = None, **kwargs) -> Iterator[str]: ...
def guess_bytes(bstring: bytes) -> tuple[str, str]: ...

File and Byte Processing

Display and Formatting

Unicode-aware text formatting for terminal display, including width calculation and justification functions that handle fullwidth characters and zero-width characters correctly.

def character_width(char: str) -> int: ...
def display_ljust(text: str, width: int, fillchar: str = " ") -> str: ...
def display_center(text: str, width: int, fillchar: str = " ") -> str: ...

Display and Formatting

Utilities and Debugging

Debugging and utility functions for understanding Unicode text and applying transformation plans manually.

def explain_unicode(text: str) -> None: ...
def apply_plan(text: str, plan: list[tuple[str, str]]) -> str: ...
def badness(text: str) -> int: ...
def is_bad(text: str) -> bool: ...

Utilities and Debugging

Command Line Interface

Command-line tool for batch text processing with configurable options for encoding, normalization, and entity handling.

def main() -> None: ...

Command Line Interface

Constants

__version__ = "6.3.1"  # Package version string

Core Types

class TextFixerConfig(NamedTuple):
    """Configuration for all ftfy text processing options."""
    unescape_html: str | bool = "auto"
    remove_terminal_escapes: bool = True  
    fix_encoding: bool = True
    restore_byte_a0: bool = True
    replace_lossy_sequences: bool = True
    decode_inconsistent_utf8: bool = True
    fix_c1_controls: bool = True
    fix_latin_ligatures: bool = True
    fix_character_width: bool = True
    uncurl_quotes: bool = True
    fix_line_breaks: bool = True
    fix_surrogates: bool = True
    remove_control_chars: bool = True
    normalization: Literal["NFC", "NFD", "NFKC", "NFKD"] | None = "NFC"
    max_decode_length: int = 1000000
    explain: bool = True

class ExplainedText(NamedTuple):
    """Result containing fixed text and explanation of changes."""
    text: str
    explanation: list[ExplanationStep] | None

class ExplanationStep(NamedTuple):
    """Single step in text transformation explanation."""
    action: str  # "encode", "decode", "transcode", "apply", "normalize"
    parameter: str  # encoding name or function name