0
# Streaming Processing
1
2
Streaming interfaces for processing large amounts of data incrementally. Provides both "pull"-based (iterator) and "push"-based (incremental) processing patterns for efficient handling of large files or data streams.
3
4
## Capabilities
5
6
### Pull-based Decoding
7
8
Iterator-based decoder that consumes input on-demand and yields Unicode strings.
9
10
```python { .api }
11
def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]:
12
"""
13
Pull-based decoder for iterables of byte strings.
14
15
Args:
16
input: Iterable of byte strings (consumed on-demand)
17
fallback_encoding: Encoding object or label string if no BOM detected
18
errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
19
20
Returns:
21
Tuple of (output_iterator, encoding_used)
22
The output iterator yields Unicode strings as input is consumed
23
24
Raises:
25
LookupError: If fallback_encoding label is unknown
26
"""
27
```
28
29
### Pull-based Encoding
30
31
Iterator-based encoder that consumes Unicode strings and yields bytes.
32
33
```python { .api }
34
def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]:
35
"""
36
Pull-based encoder for iterables of Unicode strings.
37
38
Args:
39
input: Iterable of Unicode strings
40
encoding: Encoding object or label string (defaults to UTF-8)
41
errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
42
43
Returns:
44
Iterator yielding byte strings
45
46
Raises:
47
LookupError: If encoding label is unknown
48
"""
49
```
50
51
### Push-based Decoding
52
53
Stateful decoder for incremental processing where data is fed in chunks.
54
55
```python { .api }
56
class IncrementalDecoder:
57
"""
58
Push-based decoder for incremental processing.
59
60
Attributes:
61
encoding: The detected/used Encoding object, or None if not yet determined
62
"""
63
64
def __init__(self, fallback_encoding: Encoding | str, errors: str = 'replace') -> None:
65
"""
66
Initialize incremental decoder.
67
68
Args:
69
fallback_encoding: Encoding object or label string if no BOM detected
70
errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
71
72
Raises:
73
LookupError: If fallback_encoding label is unknown
74
"""
75
76
def decode(self, input: bytes, final: bool = False) -> str:
77
"""
78
Decode one chunk of input.
79
80
Args:
81
input: Byte string chunk to decode
82
final: True if this is the last chunk (flushes any buffered data)
83
84
Returns:
85
Decoded Unicode string for this chunk
86
"""
87
```
88
89
### Push-based Encoding
90
91
Stateful encoder for incremental processing where data is fed in chunks.
92
93
```python { .api }
94
class IncrementalEncoder:
95
"""Push-based encoder for incremental processing."""
96
97
def __init__(self, encoding: Encoding | str = UTF8, errors: str = 'strict') -> None:
98
"""
99
Initialize incremental encoder.
100
101
Args:
102
encoding: Encoding object or label string (defaults to UTF-8)
103
errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
104
105
Raises:
106
LookupError: If encoding label is unknown
107
"""
108
109
def encode(self, input: str, final: bool = False) -> bytes:
110
"""
111
Encode one chunk of input.
112
113
Args:
114
input: Unicode string chunk to encode
115
final: True if this is the last chunk (flushes any buffered data)
116
117
Returns:
118
Encoded byte string for this chunk
119
"""
120
```
121
122
## Usage Examples
123
124
```python
125
import webencodings
126
127
# Pull-based decoding with iterator
128
data_chunks = [b'\xef\xbb\xbf', b'Hello ', b'World']
129
output_iter, encoding = webencodings.iter_decode(data_chunks, 'utf-8')
130
131
print(f"Detected encoding: {encoding.name}") # 'utf-8'
132
for text_chunk in output_iter:
133
print(repr(text_chunk)) # 'Hello ', 'World'
134
135
# Pull-based encoding
136
text_chunks = ['Hello ', 'World', '!']
137
byte_iter = webencodings.iter_encode(text_chunks, 'utf-8')
138
139
for byte_chunk in byte_iter:
140
print(repr(byte_chunk)) # b'Hello ', b'World', b'!'
141
142
# Push-based incremental decoding
143
decoder = webencodings.IncrementalDecoder('utf-8')
144
145
# Feed data in chunks
146
result1 = decoder.decode(b'\xef\xbb\xbfHel')
147
print(repr(result1)) # 'Hel'
148
print(decoder.encoding.name) # 'utf-8'
149
150
result2 = decoder.decode(b'lo Wor')
151
print(repr(result2)) # 'lo Wor'
152
153
result3 = decoder.decode(b'ld', final=True)
154
print(repr(result3)) # 'ld'
155
156
# Push-based incremental encoding
157
encoder = webencodings.IncrementalEncoder('utf-8')
158
159
data1 = encoder.encode('Hello ')
160
print(repr(data1)) # b'Hello '
161
162
data2 = encoder.encode('World', final=True)
163
print(repr(data2)) # b'World'
164
165
# Handle BOM detection with streaming
166
decoder = webencodings.IncrementalDecoder('iso-8859-1')
167
168
# Feed just the BOM first
169
result1 = decoder.decode(b'\xff\xfe')
170
print(repr(result1)) # ''
171
print(decoder.encoding) # None (not enough data yet)
172
173
# Feed more data to complete BOM detection
174
result2 = decoder.decode(b'H\x00e\x00')
175
print(repr(result2)) # 'He'
176
print(decoder.encoding.name) # 'utf-16le'
177
```