or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-objects.mdindex.mdstreaming-processing.mdstring-processing.mdutilities.md

core-objects.mddocs/

0

# Core Objects

1

2

Fundamental classes and lookup functionality that form the foundation of the webencodings package. These provide the core abstractions for working with character encodings according to the WHATWG Encoding standard.

3

4

## Capabilities

5

6

### Encoding Lookup

7

8

Look up character encodings by their labels using WHATWG-standard label matching rules. Handles encoding aliases and normalization according to the specification.

9

10

```python { .api }

11

def lookup(label: str) -> Encoding | None:

12

"""

13

Look for an encoding by its label following WHATWG Encoding standard.

14

15

Args:

16

label: An encoding label string (case-insensitive, whitespace-stripped)

17

18

Returns:

19

An Encoding object for the canonical encoding, or None if unknown

20

21

Examples:

22

- lookup('utf-8') -> UTF-8 Encoding

23

- lookup('latin1') -> windows-1252 Encoding

24

- lookup('unknown') -> None

25

"""

26

```

27

28

The lookup function implements ASCII case-insensitive matching and strips ASCII whitespace (tabs, newlines, form feeds, carriage returns, and spaces) before matching against the standard label mappings.

29

30

### Encoding Class

31

32

Represents a character encoding with both a canonical name and the underlying Python codec implementation.

33

34

```python { .api }

35

class Encoding:

36

"""

37

Represents a character encoding such as UTF-8.

38

39

Attributes:

40

name: Canonical name of the encoding according to WHATWG standard

41

codec_info: Python CodecInfo object providing the actual implementation

42

"""

43

44

def __init__(self, name: str, codec_info: codecs.CodecInfo) -> None: ...

45

```

46

47

The Encoding class serves as a wrapper around Python's codec system, providing standardized names while leveraging Python's existing encoding implementations. This ensures compatibility with both web standards and Python's encoding ecosystem.

48

49

## Usage Examples

50

51

```python

52

import webencodings

53

54

# Look up encodings by various labels

55

utf8 = webencodings.lookup('utf-8')

56

print(utf8.name) # 'utf-8'

57

58

# Handle aliases - latin1 maps to windows-1252 per WHATWG spec

59

latin1 = webencodings.lookup('latin1')

60

print(latin1.name) # 'windows-1252'

61

62

# Case insensitive and whitespace handling

63

encoding = webencodings.lookup(' UTF-8 ')

64

print(encoding.name) # 'utf-8'

65

66

# Unknown labels return None

67

unknown = webencodings.lookup('made-up-encoding')

68

print(unknown) # None

69

70

# Access underlying Python codec

71

utf8 = webencodings.lookup('utf-8')

72

decoded_text = utf8.codec_info.decode(b'Hello')[0]

73

print(decoded_text) # 'Hello'

74

```