or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

core-objects.mdindex.mdstreaming-processing.mdstring-processing.mdutilities.md
tile.json

tessl/pypi-webencodings

Character encoding aliases for legacy web content implementing the WHATWG Encoding standard

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/webencodings@0.5.x

To install, run

npx @tessl/cli install tessl/pypi-webencodings@0.5.0

index.mddocs/

webencodings

A Python implementation of the WHATWG Encoding standard that provides character encoding aliases for legacy web content. It addresses compatibility issues by providing standardized encoding labels, BOM detection, and proper handling of encoding declarations that follow web standards.

Package Information

  • Package Name: webencodings
  • Language: Python
  • Installation: pip install webencodings
  • Version: 0.5.1

Core Imports

import webencodings

Common specific imports:

from webencodings import lookup, decode, encode, UTF8

All encoding classes and streaming interfaces:

from webencodings import IncrementalDecoder, IncrementalEncoder

Basic Usage

import webencodings

# Look up an encoding by label
utf8_encoding = webencodings.lookup('utf-8')
windows_encoding = webencodings.lookup('windows-1252') 

# Decode bytes with BOM detection
text, encoding_used = webencodings.decode(b'\xef\xbb\xbfHello', 'utf-8')
print(text)  # "Hello"
print(encoding_used.name)  # "utf-8"

# Encode text to bytes
data = webencodings.encode("Hello", webencodings.UTF8)
print(data)  # b'Hello'

# Handle legacy web content encoding
legacy_data = b'caf\xe9'  # Latin-1 encoded "café"
text, encoding = webencodings.decode(legacy_data, 'iso-8859-1')
print(text)  # "café"

Architecture

The webencodings package follows the WHATWG Encoding standard architecture:

  • Encoding Objects: Canonical representations of character encodings with standardized names
  • Label Lookup: Maps encoding labels (including aliases) to canonical encoding names
  • BOM Detection: UTF-8/UTF-16 BOM detection that takes precedence over declared encodings
  • Streaming Interfaces: Both "pull" and "push" based processing for large data
  • Error Handling: Follows Python's codec error handling patterns

This design ensures consistent cross-implementation behavior for handling legacy web content.

Capabilities

Encoding Lookup and Core Objects

Core functionality for looking up encodings by label and the fundamental Encoding class that wraps Python codecs with WHATWG-compliant names and behavior.

def lookup(label: str) -> Encoding | None: ...
class Encoding:
    name: str
    codec_info: codecs.CodecInfo

Core Objects

Single String Processing

Simple encoding and decoding functions for processing individual strings with BOM detection and WHATWG-compliant encoding resolution.

def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]: ...
def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes: ...

String Processing

Streaming Processing

Streaming interfaces for processing large amounts of data incrementally, supporting both "pull"-based (iterator) and "push"-based (incremental) processing patterns.

def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]: ...
def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]: ...
class IncrementalDecoder: ...
class IncrementalEncoder: ...

Streaming Processing

Utilities and Constants

Utility functions and pre-defined constants including the recommended UTF-8 encoding object and ASCII case-insensitive string operations.

def ascii_lower(string: str) -> str: ...
UTF8: Encoding
VERSION: str
LABELS: dict[str, str]

Utilities