Tessl Tile for pypi/webencodings@0.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-objects.md index.md streaming-processing.md string-processing.md utilities.md

string-processing.mddocs/

0
# String Processing
1

2
Simple encoding and decoding functions for processing individual strings. These functions provide the most common use case for encoding/decoding with proper BOM detection and WHATWG-compliant behavior.
3

4
## Capabilities
5

6
### Single String Decoding
7

8
Decode a byte string to Unicode with BOM detection that takes precedence over the fallback encoding declaration.
9

10
```python { .api }
11
def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]:
12
    """
13
    Decode a single byte string with BOM detection.
14
    
15
    Args:
16
        input: Byte string to decode
17
        fallback_encoding: Encoding object or label string to use if no BOM detected
18
        errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
19
        
20
    Returns:
21
        Tuple of (decoded_unicode_string, encoding_used)
22
        
23
    Raises:
24
        LookupError: If fallback_encoding label is unknown
25
    """
26
```
27

28
The function first checks for UTF-8, UTF-16LE, or UTF-16BE BOMs. If found, the BOM is removed and the detected encoding is used. Otherwise, the fallback encoding is used for decoding.
29

30
### Single String Encoding
31

32
Encode a Unicode string to bytes using the specified encoding.
33

34
```python { .api }
35
def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes:
36
    """
37
    Encode a Unicode string to bytes.
38
    
39
    Args:
40
        input: Unicode string to encode
41
        encoding: Encoding object or label string (defaults to UTF-8)
42
        errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
43
        
44
    Returns:
45
        Encoded byte string
46
        
47
    Raises:
48
        LookupError: If encoding label is unknown
49
    """
50
```
51

52
## Usage Examples
53

54
```python
55
import webencodings
56

57
# Decode with BOM detection
58
utf8_bom_data = b'\xef\xbb\xbfHello World'
59
text, encoding = webencodings.decode(utf8_bom_data, 'iso-8859-1')
60
print(text)  # 'Hello World'
61
print(encoding.name)  # 'utf-8' (BOM detected, fallback ignored)
62

63
# Decode without BOM uses fallback
64
latin_data = b'caf\xe9'  # 'café' in latin-1
65
text, encoding = webencodings.decode(latin_data, 'iso-8859-1')
66
print(text)  # 'café'
67
print(encoding.name)  # 'windows-1252' (iso-8859-1 maps to windows-1252)
68

69
# Handle UTF-16 BOM
70
utf16_data = b'\xff\xfeH\x00e\x00l\x00l\x00o\x00'  # UTF-16LE BOM + 'Hello'
71
text, encoding = webencodings.decode(utf16_data, 'utf-8')
72
print(text)  # 'Hello'
73
print(encoding.name)  # 'utf-16le'
74

75
# Encoding strings
76
text = "Hello World"
77
data = webencodings.encode(text, 'utf-8')
78
print(data)  # b'Hello World'
79

80
# Use predefined UTF8 constant
81
data = webencodings.encode(text, webencodings.UTF8)
82
print(data)  # b'Hello World'
83

84
# Handle encoding errors
85
text = "café"
86
data = webencodings.encode(text, 'ascii', errors='replace')
87
print(data)  # b'caf?'
88

89
# Encode with different encodings
90
text = "café"
91
utf8_data = webencodings.encode(text, 'utf-8')
92
latin1_data = webencodings.encode(text, 'latin-1')
93
print(utf8_data)  # b'caf\xc3\xa9'
94
print(latin1_data)  # b'caf\xe9'
95
```

Version

Tile

Files

string-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

string-processing.mddocs/