Tessl Tile for pypi/bitarray@3.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-features.md core-operations.md creation-conversion.md index.md utility-functions.md

advanced-features.mddocs/

0
# Advanced Features
1

2
Specialized functionality for encoding/decoding variable-length prefix codes, Huffman coding, compression algorithms, and advanced bit manipulation techniques.
3

4
## Capabilities
5

6
### Variable-Length Prefix Coding
7

8
Bitarray provides efficient support for encoding and decoding variable-length prefix codes, commonly used in compression algorithms and data transmission protocols.
9

10
```python { .api }
11
def encode(self, code: dict, iterable) -> None:
12
    """
13
    Encode objects using variable-length prefix codes and append to bitarray.
14
    
15
    Args:
16
        code: Dictionary mapping objects to bitarray codes
17
        iterable: Sequence of objects to encode
18
    """
19

20
def decode(self, code: Union[dict, decodetree]) -> Iterator:
21
    """
22
    Decode bitarray using variable-length prefix codes.
23
    
24
    Args:
25
        code: Dictionary or decodetree for decoding
26
        
27
    Yields:
28
        Decoded objects from the bitarray
29
    """
30
```
31

32
**Usage Examples:**
33

34
```python
35
from bitarray import bitarray
36

37
# Create encoding dictionary
38
code = {
39
    'A': bitarray('00'),
40
    'B': bitarray('01'), 
41
    'C': bitarray('10'),
42
    'D': bitarray('11')
43
}
44

45
# Encoding
46
a = bitarray()
47
a.encode(code, 'ABCD')       # Encodes to '00011011'
48
a.encode(code, ['A', 'C'])   # Append 'AC' -> '0010'
49

50
# Decoding
51
decoded = list(a.decode(code))  # ['A', 'B', 'C', 'D', 'A', 'C']
52

53
# More complex example with text
54
text_code = {
55
    ' ': bitarray('00'),
56
    'e': bitarray('010'),
57
    't': bitarray('011'),
58
    'a': bitarray('100'),
59
    'o': bitarray('101'),
60
    'i': bitarray('110'),
61
    'n': bitarray('1110'),
62
    's': bitarray('1111')
63
}
64

65
message = bitarray()
66
message.encode(text_code, "tea")
67
decoded_text = ''.join(message.decode(text_code))  # "tea"
68
```
69

70
### Decode Trees
71

72
The decodetree class provides an optimized structure for decoding variable-length prefix codes, offering better performance than dictionary-based decoding for large code tables.
73

74
```python { .api }
75
class decodetree:
76
    """Optimized tree structure for decoding variable-length prefix codes"""
77
    
78
    def __init__(self, code: dict) -> None:
79
        """
80
        Create decode tree from code dictionary.
81
        
82
        Args:
83
            code: Dictionary mapping objects to bitarray codes
84
        """
85
    
86
    def complete(self) -> bool:
87
        """
88
        Check if decode tree is complete (prefix-free).
89
        
90
        Returns:
91
            True if tree represents a complete prefix code
92
        """
93
    
94
    def nodes(self) -> int:
95
        """
96
        Get number of internal nodes in tree.
97
        
98
        Returns:
99
            Number of internal nodes
100
        """
101
    
102
    def todict(self) -> dict:
103
        """
104
        Convert decode tree back to dictionary format.
105
        
106
        Returns:
107
            Dictionary mapping bitarray codes to objects
108
        """
109
```
110

111
**Usage Examples:**
112

113
```python
114
from bitarray import bitarray, decodetree
115

116
# Create code dictionary
117
code = {
118
    'frequent': bitarray('0'),
119
    'common': bitarray('10'),
120
    'rare': bitarray('110'),
121
    'very_rare': bitarray('111')
122
}
123

124
# Create optimized decode tree
125
tree = decodetree(code)
126

127
# Tree analysis
128
print(tree.complete())    # True (complete prefix code)
129
print(tree.nodes())       # Number of internal nodes
130

131
# Efficient decoding with tree
132
message = bitarray('0101110')
133
decoded = list(message.decode(tree))  # ['frequent', 'common', 'very_rare']
134

135
# Convert back to dictionary if needed
136
code_dict = tree.todict()  # Inverse mapping
137
```
138

139
### Huffman Coding
140

141
Functions for generating and using Huffman codes, which provide optimal variable-length encoding for known symbol frequencies.
142

143
```python { .api }
144
def huffman_code(freq_map: Union[dict, Counter], endian: Optional[str] = None) -> dict:
145
    """
146
    Generate Huffman codes from frequency map.
147
    
148
    Args:
149
        freq_map: Dictionary or Counter mapping symbols to frequencies
150
        endian: Bit-endianness for generated codes
151
        
152
    Returns:
153
        Dictionary mapping symbols to bitarray codes
154
    """
155

156
def canonical_huffman(freq_map: Union[dict, Counter]) -> tuple[dict, list, list]:
157
    """
158
    Generate canonical Huffman codes.
159
    
160
    Args:
161
        freq_map: Dictionary or Counter mapping symbols to frequencies
162
        
163
    Returns:
164
        Tuple of (code_dict, count_list, symbol_list) for canonical encoding
165
    """
166

167
def canonical_decode(a: bitarray, count: list[int], symbol: list) -> Iterator:
168
    """
169
    Decode using canonical Huffman codes.
170
    
171
    Args:
172
        a: Bitarray to decode
173
        count: List of code counts by length (from canonical_huffman)
174
        symbol: List of symbols in canonical order (from canonical_huffman)
175
        
176
    Yields:
177
        Decoded symbols
178
    """
179
```
180

181
**Usage Examples:**
182

183
```python
184
from bitarray import bitarray
185
from bitarray.util import huffman_code, canonical_huffman, canonical_decode
186
from collections import Counter
187

188
# Character frequencies in English text
189
frequencies = {
190
    'e': 127, 't': 90, 'a': 82, 'o': 75, 'i': 70, 'n': 67,
191
    's': 63, 'h': 61, 'r': 60, 'd': 43, 'l': 40, 'c': 28,
192
    'u': 28, 'm': 24, 'w': 23, 'f': 22, 'g': 20, 'y': 20,
193
    'p': 19, 'b': 13, 'v': 10, 'k': 8, 'j': 2, 'x': 2,
194
    'q': 1, 'z': 1
195
}
196

197
# Generate Huffman codes
198
code = huffman_code(frequencies)
199

200
# Most frequent characters get shorter codes
201
print(f"'e': {code['e'].to01()}")  # Short code for 'e'
202
print(f"'z': {code['z'].to01()}")  # Longer code for 'z'
203

204
# Encode text
205
text = "hello world"
206
encoded = bitarray()
207
encoded.encode(code, text)
208

209
# Decode back
210
decoded_text = ''.join(encoded.decode(code))
211
print(decoded_text == text)  # True
212

213
# Canonical Huffman (standardized format)
214
canon_code, count, symbol = canonical_huffman(frequencies)
215
encoded_canon = bitarray()
216
encoded_canon.encode(canon_code, text)
217

218
# Canonical decoding
219
decoded_canon = list(canonical_decode(encoded_canon, count, symbol))
220
print(''.join(decoded_canon) == text)  # True
221
```
222

223
### Advanced Encoding Techniques
224

225
Additional encoding and compression techniques for specialized use cases.
226

227
```python { .api }
228
# From utility module - already covered in detail in utility-functions.md
229
def sc_encode(a: bitarray) -> bytes:
230
    """Sparse compression - optimal for arrays with few set bits"""
231

232
def sc_decode(stream: Iterable[int]) -> bitarray:
233
    """Decode sparse-compressed data"""
234

235
def vl_encode(a: bitarray) -> bytes:
236
    """Variable-length encoding for general compression"""
237

238
def vl_decode(stream: Iterable[int], endian: Optional[str] = None) -> bitarray:
239
    """Variable-length decoding"""
240
```
241

242
### Practical Applications
243

244
Here are complete examples showing how these advanced features work together for real-world applications:
245

246
**Text Compression Example:**
247

248
```python
249
from bitarray import bitarray
250
from bitarray.util import huffman_code
251
from collections import Counter
252

253
def compress_text(text: str) -> tuple[bitarray, dict]:
254
    """Compress text using Huffman coding"""
255
    # Analyze character frequencies
256
    frequencies = Counter(text)
257
    
258
    # Generate optimal codes
259
    code = huffman_code(frequencies)
260
    
261
    # Encode text
262
    compressed = bitarray()
263
    compressed.encode(code, text)
264
    
265
    return compressed, code
266

267
def decompress_text(compressed: bitarray, code: dict) -> str:
268
    """Decompress Huffman-coded text"""
269
    return ''.join(compressed.decode(code))
270

271
# Example usage
272
original_text = "this is a test message for compression"
273
compressed_bits, encoding = compress_text(original_text)
274
decompressed_text = decompress_text(compressed_bits, encoding)
275

276
print(f"Original: {len(original_text * 8)} bits")  # 8 bits per ASCII char
277
print(f"Compressed: {len(compressed_bits)} bits")
278
print(f"Compression ratio: {len(compressed_bits) / (len(original_text) * 8):.2f}")
279
print(f"Match: {original_text == decompressed_text}")
280
```
281

282
**Network Protocol Example:**
283

284
```python
285
from bitarray import bitarray, decodetree
286

287
def create_protocol_decoder():
288
    """Create decoder for a hypothetical network protocol"""
289
    # Define protocol message codes
290
    protocol_codes = {
291
        'START': bitarray('000'),
292
        'DATA': bitarray('001'), 
293
        'ACK': bitarray('010'),
294
        'NACK': bitarray('011'),
295
        'END': bitarray('100'),
296
        'ERROR': bitarray('101')
297
    }
298
    
299
    # Create optimized decoder tree
300
    return decodetree(protocol_codes)
301

302
def encode_message(commands: list[str]) -> bitarray:
303
    """Encode protocol message"""
304
    code = {
305
        'START': bitarray('000'), 'DATA': bitarray('001'),
306
        'ACK': bitarray('010'), 'NACK': bitarray('011'),
307
        'END': bitarray('100'), 'ERROR': bitarray('101')
308
    }
309
    
310
    message = bitarray()
311
    message.encode(code, commands)
312
    return message
313

314
def decode_message(message: bitarray, decoder: decodetree) -> list[str]:
315
    """Decode protocol message"""
316
    return list(message.decode(decoder))
317

318
# Example protocol usage
319
decoder = create_protocol_decoder()
320
commands = ['START', 'DATA', 'DATA', 'ACK', 'END']
321
encoded = encode_message(commands)
322
decoded = decode_message(encoded, decoder)
323

324
print(f"Commands: {commands}")
325
print(f"Encoded: {encoded.to01()}")
326
print(f"Decoded: {decoded}")
327
print(f"Match: {commands == decoded}")
328
```
329

330
**Data Stream Processing:**
331

332
```python
333
from bitarray import bitarray
334
from bitarray.util import serialize, deserialize
335

336
class BitStreamProcessor:
337
    """Process streams of bit data with encoding/decoding"""
338
    
339
    def __init__(self, chunk_size: int = 1024):
340
        self.chunk_size = chunk_size
341
        self.buffer = bitarray()
342
    
343
    def add_data(self, data: bitarray) -> None:
344
        """Add data to processing buffer"""
345
        self.buffer.extend(data)
346
    
347
    def process_chunks(self, decoder: decodetree) -> list:
348
        """Process complete chunks from buffer"""
349
        results = []
350
        
351
        while len(self.buffer) >= self.chunk_size:
352
            # Extract chunk
353
            chunk = self.buffer[:self.chunk_size]
354
            self.buffer = self.buffer[self.chunk_size:]
355
            
356
            # Process chunk
357
            decoded = list(chunk.decode(decoder))
358
            results.extend(decoded)
359
        
360
        return results
361
    
362
    def save_state(self) -> bytes:
363
        """Serialize current buffer state"""
364
        return serialize(self.buffer)
365
    
366
    def restore_state(self, state: bytes) -> None:
367
        """Restore buffer from serialized state"""
368
        self.buffer = deserialize(state)
369

370
# Example usage
371
processor = BitStreamProcessor(chunk_size=32)
372
decoder = create_protocol_decoder()
373

374
# Simulate streaming data
375
stream_data = encode_message(['START', 'DATA'] * 10)
376
processor.add_data(stream_data)
377

378
# Process available chunks
379
results = processor.process_chunks(decoder)
380
print(f"Processed {len(results)} commands")
381

382
# Save and restore state
383
state = processor.save_state()
384
processor.restore_state(state)
385
```
386

387
## Performance Considerations
388

389
Advanced features are optimized for different use cases:
390

391
- **decodetree**: Faster than dictionary decoding for large code tables
392
- **Huffman coding**: Optimal compression for known symbol frequencies  
393
- **Canonical Huffman**: Standardized format for interoperability
394
- **Variable-length encoding**: General-purpose compression
395
- **Sparse compression**: Optimal for arrays with few set bits
396

397
Choose the appropriate technique based on your data characteristics and performance requirements.

Version

Tile

Files

advanced-features.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

advanced-features.mddocs/