or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-objects.mdindex.mdstreaming-processing.mdstring-processing.mdutilities.md

streaming-processing.mddocs/

0

# Streaming Processing

1

2

Streaming interfaces for processing large amounts of data incrementally. Provides both "pull"-based (iterator) and "push"-based (incremental) processing patterns for efficient handling of large files or data streams.

3

4

## Capabilities

5

6

### Pull-based Decoding

7

8

Iterator-based decoder that consumes input on-demand and yields Unicode strings.

9

10

```python { .api }

11

def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]:

12

"""

13

Pull-based decoder for iterables of byte strings.

14

15

Args:

16

input: Iterable of byte strings (consumed on-demand)

17

fallback_encoding: Encoding object or label string if no BOM detected

18

errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)

19

20

Returns:

21

Tuple of (output_iterator, encoding_used)

22

The output iterator yields Unicode strings as input is consumed

23

24

Raises:

25

LookupError: If fallback_encoding label is unknown

26

"""

27

```

28

29

### Pull-based Encoding

30

31

Iterator-based encoder that consumes Unicode strings and yields bytes.

32

33

```python { .api }

34

def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]:

35

"""

36

Pull-based encoder for iterables of Unicode strings.

37

38

Args:

39

input: Iterable of Unicode strings

40

encoding: Encoding object or label string (defaults to UTF-8)

41

errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)

42

43

Returns:

44

Iterator yielding byte strings

45

46

Raises:

47

LookupError: If encoding label is unknown

48

"""

49

```

50

51

### Push-based Decoding

52

53

Stateful decoder for incremental processing where data is fed in chunks.

54

55

```python { .api }

56

class IncrementalDecoder:

57

"""

58

Push-based decoder for incremental processing.

59

60

Attributes:

61

encoding: The detected/used Encoding object, or None if not yet determined

62

"""

63

64

def __init__(self, fallback_encoding: Encoding | str, errors: str = 'replace') -> None:

65

"""

66

Initialize incremental decoder.

67

68

Args:

69

fallback_encoding: Encoding object or label string if no BOM detected

70

errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)

71

72

Raises:

73

LookupError: If fallback_encoding label is unknown

74

"""

75

76

def decode(self, input: bytes, final: bool = False) -> str:

77

"""

78

Decode one chunk of input.

79

80

Args:

81

input: Byte string chunk to decode

82

final: True if this is the last chunk (flushes any buffered data)

83

84

Returns:

85

Decoded Unicode string for this chunk

86

"""

87

```

88

89

### Push-based Encoding

90

91

Stateful encoder for incremental processing where data is fed in chunks.

92

93

```python { .api }

94

class IncrementalEncoder:

95

"""Push-based encoder for incremental processing."""

96

97

def __init__(self, encoding: Encoding | str = UTF8, errors: str = 'strict') -> None:

98

"""

99

Initialize incremental encoder.

100

101

Args:

102

encoding: Encoding object or label string (defaults to UTF-8)

103

errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)

104

105

Raises:

106

LookupError: If encoding label is unknown

107

"""

108

109

def encode(self, input: str, final: bool = False) -> bytes:

110

"""

111

Encode one chunk of input.

112

113

Args:

114

input: Unicode string chunk to encode

115

final: True if this is the last chunk (flushes any buffered data)

116

117

Returns:

118

Encoded byte string for this chunk

119

"""

120

```

121

122

## Usage Examples

123

124

```python

125

import webencodings

126

127

# Pull-based decoding with iterator

128

data_chunks = [b'\xef\xbb\xbf', b'Hello ', b'World']

129

output_iter, encoding = webencodings.iter_decode(data_chunks, 'utf-8')

130

131

print(f"Detected encoding: {encoding.name}") # 'utf-8'

132

for text_chunk in output_iter:

133

print(repr(text_chunk)) # 'Hello ', 'World'

134

135

# Pull-based encoding

136

text_chunks = ['Hello ', 'World', '!']

137

byte_iter = webencodings.iter_encode(text_chunks, 'utf-8')

138

139

for byte_chunk in byte_iter:

140

print(repr(byte_chunk)) # b'Hello ', b'World', b'!'

141

142

# Push-based incremental decoding

143

decoder = webencodings.IncrementalDecoder('utf-8')

144

145

# Feed data in chunks

146

result1 = decoder.decode(b'\xef\xbb\xbfHel')

147

print(repr(result1)) # 'Hel'

148

print(decoder.encoding.name) # 'utf-8'

149

150

result2 = decoder.decode(b'lo Wor')

151

print(repr(result2)) # 'lo Wor'

152

153

result3 = decoder.decode(b'ld', final=True)

154

print(repr(result3)) # 'ld'

155

156

# Push-based incremental encoding

157

encoder = webencodings.IncrementalEncoder('utf-8')

158

159

data1 = encoder.encode('Hello ')

160

print(repr(data1)) # b'Hello '

161

162

data2 = encoder.encode('World', final=True)

163

print(repr(data2)) # b'World'

164

165

# Handle BOM detection with streaming

166

decoder = webencodings.IncrementalDecoder('iso-8859-1')

167

168

# Feed just the BOM first

169

result1 = decoder.decode(b'\xff\xfe')

170

print(repr(result1)) # ''

171

print(decoder.encoding) # None (not enough data yet)

172

173

# Feed more data to complete BOM detection

174

result2 = decoder.decode(b'H\x00e\x00')

175

print(repr(result2)) # 'He'

176

print(decoder.encoding.name) # 'utf-16le'

177

```