Character encoding aliases for legacy web content implementing the WHATWG Encoding standard
Simple encoding and decoding functions for processing individual strings. These functions provide the most common use case for encoding/decoding with proper BOM detection and WHATWG-compliant behavior.
Decode a byte string to Unicode with BOM detection that takes precedence over the fallback encoding declaration.
def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]:
"""
Decode a single byte string with BOM detection.
Args:
input: Byte string to decode
fallback_encoding: Encoding object or label string to use if no BOM detected
errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
Returns:
Tuple of (decoded_unicode_string, encoding_used)
Raises:
LookupError: If fallback_encoding label is unknown
"""The function first checks for UTF-8, UTF-16LE, or UTF-16BE BOMs. If found, the BOM is removed and the detected encoding is used. Otherwise, the fallback encoding is used for decoding.
Encode a Unicode string to bytes using the specified encoding.
def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes:
"""
Encode a Unicode string to bytes.
Args:
input: Unicode string to encode
encoding: Encoding object or label string (defaults to UTF-8)
errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
Returns:
Encoded byte string
Raises:
LookupError: If encoding label is unknown
"""import webencodings
# Decode with BOM detection
utf8_bom_data = b'\xef\xbb\xbfHello World'
text, encoding = webencodings.decode(utf8_bom_data, 'iso-8859-1')
print(text) # 'Hello World'
print(encoding.name) # 'utf-8' (BOM detected, fallback ignored)
# Decode without BOM uses fallback
latin_data = b'caf\xe9' # 'café' in latin-1
text, encoding = webencodings.decode(latin_data, 'iso-8859-1')
print(text) # 'café'
print(encoding.name) # 'windows-1252' (iso-8859-1 maps to windows-1252)
# Handle UTF-16 BOM
utf16_data = b'\xff\xfeH\x00e\x00l\x00l\x00o\x00' # UTF-16LE BOM + 'Hello'
text, encoding = webencodings.decode(utf16_data, 'utf-8')
print(text) # 'Hello'
print(encoding.name) # 'utf-16le'
# Encoding strings
text = "Hello World"
data = webencodings.encode(text, 'utf-8')
print(data) # b'Hello World'
# Use predefined UTF8 constant
data = webencodings.encode(text, webencodings.UTF8)
print(data) # b'Hello World'
# Handle encoding errors
text = "café"
data = webencodings.encode(text, 'ascii', errors='replace')
print(data) # b'caf?'
# Encode with different encodings
text = "café"
utf8_data = webencodings.encode(text, 'utf-8')
latin1_data = webencodings.encode(text, 'latin-1')
print(utf8_data) # b'caf\xc3\xa9'
print(latin1_data) # b'caf\xe9'Install with Tessl CLI
npx tessl i tessl/pypi-webencodings