or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-webencodings

Character encoding aliases for legacy web content implementing the WHATWG Encoding standard

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/webencodings@0.5.x

To install, run

npx @tessl/cli install tessl/pypi-webencodings@0.5.0

0

# webencodings

1

2

A Python implementation of the WHATWG Encoding standard that provides character encoding aliases for legacy web content. It addresses compatibility issues by providing standardized encoding labels, BOM detection, and proper handling of encoding declarations that follow web standards.

3

4

## Package Information

5

6

- **Package Name**: webencodings

7

- **Language**: Python

8

- **Installation**: `pip install webencodings`

9

- **Version**: 0.5.1

10

11

## Core Imports

12

13

```python

14

import webencodings

15

```

16

17

Common specific imports:

18

19

```python

20

from webencodings import lookup, decode, encode, UTF8

21

```

22

23

All encoding classes and streaming interfaces:

24

25

```python

26

from webencodings import IncrementalDecoder, IncrementalEncoder

27

```

28

29

## Basic Usage

30

31

```python

32

import webencodings

33

34

# Look up an encoding by label

35

utf8_encoding = webencodings.lookup('utf-8')

36

windows_encoding = webencodings.lookup('windows-1252')

37

38

# Decode bytes with BOM detection

39

text, encoding_used = webencodings.decode(b'\xef\xbb\xbfHello', 'utf-8')

40

print(text) # "Hello"

41

print(encoding_used.name) # "utf-8"

42

43

# Encode text to bytes

44

data = webencodings.encode("Hello", webencodings.UTF8)

45

print(data) # b'Hello'

46

47

# Handle legacy web content encoding

48

legacy_data = b'caf\xe9' # Latin-1 encoded "café"

49

text, encoding = webencodings.decode(legacy_data, 'iso-8859-1')

50

print(text) # "café"

51

```

52

53

## Architecture

54

55

The webencodings package follows the WHATWG Encoding standard architecture:

56

57

- **Encoding Objects**: Canonical representations of character encodings with standardized names

58

- **Label Lookup**: Maps encoding labels (including aliases) to canonical encoding names

59

- **BOM Detection**: UTF-8/UTF-16 BOM detection that takes precedence over declared encodings

60

- **Streaming Interfaces**: Both "pull" and "push" based processing for large data

61

- **Error Handling**: Follows Python's codec error handling patterns

62

63

This design ensures consistent cross-implementation behavior for handling legacy web content.

64

65

## Capabilities

66

67

### Encoding Lookup and Core Objects

68

69

Core functionality for looking up encodings by label and the fundamental Encoding class that wraps Python codecs with WHATWG-compliant names and behavior.

70

71

```python { .api }

72

def lookup(label: str) -> Encoding | None: ...

73

class Encoding:

74

name: str

75

codec_info: codecs.CodecInfo

76

```

77

78

[Core Objects](./core-objects.md)

79

80

### Single String Processing

81

82

Simple encoding and decoding functions for processing individual strings with BOM detection and WHATWG-compliant encoding resolution.

83

84

```python { .api }

85

def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]: ...

86

def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes: ...

87

```

88

89

[String Processing](./string-processing.md)

90

91

### Streaming Processing

92

93

Streaming interfaces for processing large amounts of data incrementally, supporting both "pull"-based (iterator) and "push"-based (incremental) processing patterns.

94

95

```python { .api }

96

def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]: ...

97

def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]: ...

98

class IncrementalDecoder: ...

99

class IncrementalEncoder: ...

100

```

101

102

[Streaming Processing](./streaming-processing.md)

103

104

### Utilities and Constants

105

106

Utility functions and pre-defined constants including the recommended UTF-8 encoding object and ASCII case-insensitive string operations.

107

108

```python { .api }

109

def ascii_lower(string: str) -> str: ...

110

UTF8: Encoding

111

VERSION: str

112

LABELS: dict[str, str]

113

```

114

115

[Utilities](./utilities.md)