or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-objects.mdindex.mdstreaming-processing.mdstring-processing.mdutilities.md

string-processing.mddocs/

0

# String Processing

1

2

Simple encoding and decoding functions for processing individual strings. These functions provide the most common use case for encoding/decoding with proper BOM detection and WHATWG-compliant behavior.

3

4

## Capabilities

5

6

### Single String Decoding

7

8

Decode a byte string to Unicode with BOM detection that takes precedence over the fallback encoding declaration.

9

10

```python { .api }

11

def decode(input: bytes, fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[str, Encoding]:

12

"""

13

Decode a single byte string with BOM detection.

14

15

Args:

16

input: Byte string to decode

17

fallback_encoding: Encoding object or label string to use if no BOM detected

18

errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)

19

20

Returns:

21

Tuple of (decoded_unicode_string, encoding_used)

22

23

Raises:

24

LookupError: If fallback_encoding label is unknown

25

"""

26

```

27

28

The function first checks for UTF-8, UTF-16LE, or UTF-16BE BOMs. If found, the BOM is removed and the detected encoding is used. Otherwise, the fallback encoding is used for decoding.

29

30

### Single String Encoding

31

32

Encode a Unicode string to bytes using the specified encoding.

33

34

```python { .api }

35

def encode(input: str, encoding: Encoding | str = UTF8, errors: str = 'strict') -> bytes:

36

"""

37

Encode a Unicode string to bytes.

38

39

Args:

40

input: Unicode string to encode

41

encoding: Encoding object or label string (defaults to UTF-8)

42

errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)

43

44

Returns:

45

Encoded byte string

46

47

Raises:

48

LookupError: If encoding label is unknown

49

"""

50

```

51

52

## Usage Examples

53

54

```python

55

import webencodings

56

57

# Decode with BOM detection

58

utf8_bom_data = b'\xef\xbb\xbfHello World'

59

text, encoding = webencodings.decode(utf8_bom_data, 'iso-8859-1')

60

print(text) # 'Hello World'

61

print(encoding.name) # 'utf-8' (BOM detected, fallback ignored)

62

63

# Decode without BOM uses fallback

64

latin_data = b'caf\xe9' # 'café' in latin-1

65

text, encoding = webencodings.decode(latin_data, 'iso-8859-1')

66

print(text) # 'café'

67

print(encoding.name) # 'windows-1252' (iso-8859-1 maps to windows-1252)

68

69

# Handle UTF-16 BOM

70

utf16_data = b'\xff\xfeH\x00e\x00l\x00l\x00o\x00' # UTF-16LE BOM + 'Hello'

71

text, encoding = webencodings.decode(utf16_data, 'utf-8')

72

print(text) # 'Hello'

73

print(encoding.name) # 'utf-16le'

74

75

# Encoding strings

76

text = "Hello World"

77

data = webencodings.encode(text, 'utf-8')

78

print(data) # b'Hello World'

79

80

# Use predefined UTF8 constant

81

data = webencodings.encode(text, webencodings.UTF8)

82

print(data) # b'Hello World'

83

84

# Handle encoding errors

85

text = "café"

86

data = webencodings.encode(text, 'ascii', errors='replace')

87

print(data) # b'caf?'

88

89

# Encode with different encodings

90

text = "café"

91

utf8_data = webencodings.encode(text, 'utf-8')

92

latin1_data = webencodings.encode(text, 'latin-1')

93

print(utf8_data) # b'caf\xc3\xa9'

94

print(latin1_data) # b'caf\xe9'

95

```