0
# Core Objects
1
2
Fundamental classes and lookup functionality that form the foundation of the webencodings package. These provide the core abstractions for working with character encodings according to the WHATWG Encoding standard.
3
4
## Capabilities
5
6
### Encoding Lookup
7
8
Look up character encodings by their labels using WHATWG-standard label matching rules. Handles encoding aliases and normalization according to the specification.
9
10
```python { .api }
11
def lookup(label: str) -> Encoding | None:
12
"""
13
Look for an encoding by its label following WHATWG Encoding standard.
14
15
Args:
16
label: An encoding label string (case-insensitive, whitespace-stripped)
17
18
Returns:
19
An Encoding object for the canonical encoding, or None if unknown
20
21
Examples:
22
- lookup('utf-8') -> UTF-8 Encoding
23
- lookup('latin1') -> windows-1252 Encoding
24
- lookup('unknown') -> None
25
"""
26
```
27
28
The lookup function implements ASCII case-insensitive matching and strips ASCII whitespace (tabs, newlines, form feeds, carriage returns, and spaces) before matching against the standard label mappings.
29
30
### Encoding Class
31
32
Represents a character encoding with both a canonical name and the underlying Python codec implementation.
33
34
```python { .api }
35
class Encoding:
36
"""
37
Represents a character encoding such as UTF-8.
38
39
Attributes:
40
name: Canonical name of the encoding according to WHATWG standard
41
codec_info: Python CodecInfo object providing the actual implementation
42
"""
43
44
def __init__(self, name: str, codec_info: codecs.CodecInfo) -> None: ...
45
```
46
47
The Encoding class serves as a wrapper around Python's codec system, providing standardized names while leveraging Python's existing encoding implementations. This ensures compatibility with both web standards and Python's encoding ecosystem.
48
49
## Usage Examples
50
51
```python
52
import webencodings
53
54
# Look up encodings by various labels
55
utf8 = webencodings.lookup('utf-8')
56
print(utf8.name) # 'utf-8'
57
58
# Handle aliases - latin1 maps to windows-1252 per WHATWG spec
59
latin1 = webencodings.lookup('latin1')
60
print(latin1.name) # 'windows-1252'
61
62
# Case insensitive and whitespace handling
63
encoding = webencodings.lookup(' UTF-8 ')
64
print(encoding.name) # 'utf-8'
65
66
# Unknown labels return None
67
unknown = webencodings.lookup('made-up-encoding')
68
print(unknown) # None
69
70
# Access underlying Python codec
71
utf8 = webencodings.lookup('utf-8')
72
decoded_text = utf8.codec_info.decode(b'Hello')[0]
73
print(decoded_text) # 'Hello'
74
```