Tessl Tile for pypi/webencodings@0.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-objects.md index.md streaming-processing.md string-processing.md utilities.md

streaming-processing.mddocs/

0
# Streaming Processing
1

2
Streaming interfaces for processing large amounts of data incrementally. Provides both "pull"-based (iterator) and "push"-based (incremental) processing patterns for efficient handling of large files or data streams.
3

4
## Capabilities
5

6
### Pull-based Decoding
7

8
Iterator-based decoder that consumes input on-demand and yields Unicode strings.
9

10
```python { .api }
11
def iter_decode(input: Iterable[bytes], fallback_encoding: Encoding | str, errors: str = 'replace') -> tuple[Iterator[str], Encoding]:
12
    """
13
    Pull-based decoder for iterables of byte strings.
14
    
15
    Args:
16
        input: Iterable of byte strings (consumed on-demand)
17
        fallback_encoding: Encoding object or label string if no BOM detected
18
        errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
19
        
20
    Returns:
21
        Tuple of (output_iterator, encoding_used)
22
        The output iterator yields Unicode strings as input is consumed
23
        
24
    Raises:
25
        LookupError: If fallback_encoding label is unknown
26
    """
27
```
28

29
### Pull-based Encoding
30

31
Iterator-based encoder that consumes Unicode strings and yields bytes.
32

33
```python { .api }
34
def iter_encode(input: Iterable[str], encoding: Encoding | str = UTF8, errors: str = 'strict') -> Iterator[bytes]:
35
    """
36
    Pull-based encoder for iterables of Unicode strings.
37
    
38
    Args:
39
        input: Iterable of Unicode strings
40
        encoding: Encoding object or label string (defaults to UTF-8)
41
        errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
42
        
43
    Returns:
44
        Iterator yielding byte strings
45
        
46
    Raises:
47
        LookupError: If encoding label is unknown
48
    """
49
```
50

51
### Push-based Decoding
52

53
Stateful decoder for incremental processing where data is fed in chunks.
54

55
```python { .api }
56
class IncrementalDecoder:
57
    """
58
    Push-based decoder for incremental processing.
59
    
60
    Attributes:
61
        encoding: The detected/used Encoding object, or None if not yet determined
62
    """
63
    
64
    def __init__(self, fallback_encoding: Encoding | str, errors: str = 'replace') -> None:
65
        """
66
        Initialize incremental decoder.
67
        
68
        Args:
69
            fallback_encoding: Encoding object or label string if no BOM detected
70
            errors: Error handling strategy ('replace', 'strict', 'ignore', etc.)
71
            
72
        Raises:
73
            LookupError: If fallback_encoding label is unknown
74
        """
75
    
76
    def decode(self, input: bytes, final: bool = False) -> str:
77
        """
78
        Decode one chunk of input.
79
        
80
        Args:
81
            input: Byte string chunk to decode
82
            final: True if this is the last chunk (flushes any buffered data)
83
            
84
        Returns:
85
            Decoded Unicode string for this chunk
86
        """
87
```
88

89
### Push-based Encoding
90

91
Stateful encoder for incremental processing where data is fed in chunks.
92

93
```python { .api }
94
class IncrementalEncoder:
95
    """Push-based encoder for incremental processing."""
96
    
97
    def __init__(self, encoding: Encoding | str = UTF8, errors: str = 'strict') -> None:
98
        """
99
        Initialize incremental encoder.
100
        
101
        Args:
102
            encoding: Encoding object or label string (defaults to UTF-8)
103
            errors: Error handling strategy ('strict', 'replace', 'ignore', etc.)
104
            
105
        Raises:
106
            LookupError: If encoding label is unknown
107
        """
108
    
109
    def encode(self, input: str, final: bool = False) -> bytes:
110
        """
111
        Encode one chunk of input.
112
        
113
        Args:
114
            input: Unicode string chunk to encode
115
            final: True if this is the last chunk (flushes any buffered data)
116
            
117
        Returns:
118
            Encoded byte string for this chunk
119
        """
120
```
121

122
## Usage Examples
123

124
```python
125
import webencodings
126

127
# Pull-based decoding with iterator
128
data_chunks = [b'\xef\xbb\xbf', b'Hello ', b'World']
129
output_iter, encoding = webencodings.iter_decode(data_chunks, 'utf-8')
130

131
print(f"Detected encoding: {encoding.name}")  # 'utf-8'
132
for text_chunk in output_iter:
133
    print(repr(text_chunk))  # 'Hello ', 'World'
134

135
# Pull-based encoding
136
text_chunks = ['Hello ', 'World', '!']
137
byte_iter = webencodings.iter_encode(text_chunks, 'utf-8')
138

139
for byte_chunk in byte_iter:
140
    print(repr(byte_chunk))  # b'Hello ', b'World', b'!'
141

142
# Push-based incremental decoding
143
decoder = webencodings.IncrementalDecoder('utf-8')
144

145
# Feed data in chunks
146
result1 = decoder.decode(b'\xef\xbb\xbfHel')
147
print(repr(result1))  # 'Hel'
148
print(decoder.encoding.name)  # 'utf-8'
149

150
result2 = decoder.decode(b'lo Wor')
151
print(repr(result2))  # 'lo Wor'
152

153
result3 = decoder.decode(b'ld', final=True)
154
print(repr(result3))  # 'ld'
155

156
# Push-based incremental encoding
157
encoder = webencodings.IncrementalEncoder('utf-8')
158

159
data1 = encoder.encode('Hello ')
160
print(repr(data1))  # b'Hello '
161

162
data2 = encoder.encode('World', final=True)
163
print(repr(data2))  # b'World'
164

165
# Handle BOM detection with streaming
166
decoder = webencodings.IncrementalDecoder('iso-8859-1')
167

168
# Feed just the BOM first
169
result1 = decoder.decode(b'\xff\xfe')
170
print(repr(result1))  # ''
171
print(decoder.encoding)  # None (not enough data yet)
172

173
# Feed more data to complete BOM detection
174
result2 = decoder.decode(b'H\x00e\x00')
175
print(repr(result2))  # 'He'
176
print(decoder.encoding.name)  # 'utf-16le'
177
```

Version

Tile

Files

streaming-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

streaming-processing.mddocs/