Character encoding aliases for legacy web content implementing the WHATWG Encoding standard
npx @tessl/cli install tessl/pypi-webencodings@0.5.00
# webencodings
1
2
A Python implementation of the WHATWG Encoding standard that provides character encoding aliases for legacy web content. It addresses compatibility issues by providing standardized encoding labels, BOM detection, and proper handling of encoding declarations that follow web standards.
3
4
## Package Information
5
6
- **Package Name**: webencodings
7
- **Language**: Python
8
- **Installation**: `pip install webencodings`
9
- **Version**: 0.5.1
10
11
## Core Imports
12
13
```python
14
import webencodings
15
```
16
17
Common specific imports:
18
19
```python
20
from webencodings import lookup, decode, encode, UTF8
21
```
22
23
All encoding classes and streaming interfaces:
24
25
```python
26
from webencodings import IncrementalDecoder, IncrementalEncoder
27
```
28
29
## Basic Usage
30
31
```python
32
import webencodings
33
34
# Look up an encoding by label
35
utf8_encoding = webencodings.lookup('utf-8')
36
windows_encoding = webencodings.lookup('windows-1252')
37
38
# Decode bytes with BOM detection
39
text, encoding_used = webencodings.decode(b'\xef\xbb\xbfHello', 'utf-8')
40
print(text) # "Hello"
41
print(encoding_used.name) # "utf-8"
42
43
# Encode text to bytes
44
data = webencodings.encode("Hello", webencodings.UTF8)
45
print(data) # b'Hello'
46
47
# Handle legacy web content encoding
48
legacy_data = b'caf\xe9' # Latin-1 encoded "café"
49
text, encoding = webencodings.decode(legacy_data, 'iso-8859-1')
50
print(text) # "café"
51
```
52
53
## Architecture
54
55
The webencodings package follows the WHATWG Encoding standard architecture:
56
57
- **Encoding Objects**: Canonical representations of character encodings with standardized names
58
- **Label Lookup**: Maps encoding labels (including aliases) to canonical encoding names
59
- **BOM Detection**: UTF-8/UTF-16 BOM detection that takes precedence over declared encodings
60
- **Streaming Interfaces**: Both "pull" and "push" based processing for large data
61
- **Error Handling**: Follows Python's codec error handling patterns
62
63
This design ensures consistent cross-implementation behavior for handling legacy web content.
64
65
## Capabilities
66
67
### Encoding Lookup and Core Objects
68
69
Core functionality for looking up encodings by label and the fundamental Encoding class that wraps Python codecs with WHATWG-compliant names and behavior.
70
71
```python { .api }
72
def lookup(label: str) -> Encoding | None: ...
73
class Encoding:
74
name: str
75
codec_info: codecs.CodecInfo
76
```
77
78
[Core Objects](./core-objects.md)
79
80
### Single String Processing
81
82
Simple encoding and decoding functions for processing individual strings with BOM detection and WHATWG-compliant encoding resolution.
83
84
```python { .api }
85
def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]: ...
86
def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes: ...
87
```
88
89
[String Processing](./string-processing.md)
90
91
### Streaming Processing
92
93
Streaming interfaces for processing large amounts of data incrementally, supporting both "pull"-based (iterator) and "push"-based (incremental) processing patterns.
94
95
```python { .api }
96
def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]: ...
97
def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]: ...
98
class IncrementalDecoder: ...
99
class IncrementalEncoder: ...
100
```
101
102
[Streaming Processing](./streaming-processing.md)
103
104
### Utilities and Constants
105
106
Utility functions and pre-defined constants including the recommended UTF-8 encoding object and ASCII case-insensitive string operations.
107
108
```python { .api }
109
def ascii_lower(string: str) -> str: ...
110
UTF8: Encoding
111
VERSION: str
112
LABELS: dict[str, str]
113
```
114
115
[Utilities](./utilities.md)