Character encoding aliases for legacy web content implementing the WHATWG Encoding standard
npx @tessl/cli install tessl/pypi-webencodings@0.5.0A Python implementation of the WHATWG Encoding standard that provides character encoding aliases for legacy web content. It addresses compatibility issues by providing standardized encoding labels, BOM detection, and proper handling of encoding declarations that follow web standards.
pip install webencodingsimport webencodingsCommon specific imports:
from webencodings import lookup, decode, encode, UTF8All encoding classes and streaming interfaces:
from webencodings import IncrementalDecoder, IncrementalEncoderimport webencodings
# Look up an encoding by label
utf8_encoding = webencodings.lookup('utf-8')
windows_encoding = webencodings.lookup('windows-1252')
# Decode bytes with BOM detection
text, encoding_used = webencodings.decode(b'\xef\xbb\xbfHello', 'utf-8')
print(text) # "Hello"
print(encoding_used.name) # "utf-8"
# Encode text to bytes
data = webencodings.encode("Hello", webencodings.UTF8)
print(data) # b'Hello'
# Handle legacy web content encoding
legacy_data = b'caf\xe9' # Latin-1 encoded "café"
text, encoding = webencodings.decode(legacy_data, 'iso-8859-1')
print(text) # "café"The webencodings package follows the WHATWG Encoding standard architecture:
This design ensures consistent cross-implementation behavior for handling legacy web content.
Core functionality for looking up encodings by label and the fundamental Encoding class that wraps Python codecs with WHATWG-compliant names and behavior.
def lookup(label: str) -> Encoding | None: ...
class Encoding:
name: str
codec_info: codecs.CodecInfoSimple encoding and decoding functions for processing individual strings with BOM detection and WHATWG-compliant encoding resolution.
def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]: ...
def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes: ...Streaming interfaces for processing large amounts of data incrementally, supporting both "pull"-based (iterator) and "push"-based (incremental) processing patterns.
def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]: ...
def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]: ...
class IncrementalDecoder: ...
class IncrementalEncoder: ...Utility functions and pre-defined constants including the recommended UTF-8 encoding object and ASCII case-insensitive string operations.
def ascii_lower(string: str) -> str: ...
UTF8: Encoding
VERSION: str
LABELS: dict[str, str]