0
# Advanced Features
1
2
Specialized functionality for encoding/decoding variable-length prefix codes, Huffman coding, compression algorithms, and advanced bit manipulation techniques.
3
4
## Capabilities
5
6
### Variable-Length Prefix Coding
7
8
Bitarray provides efficient support for encoding and decoding variable-length prefix codes, commonly used in compression algorithms and data transmission protocols.
9
10
```python { .api }
11
def encode(self, code: dict, iterable) -> None:
12
"""
13
Encode objects using variable-length prefix codes and append to bitarray.
14
15
Args:
16
code: Dictionary mapping objects to bitarray codes
17
iterable: Sequence of objects to encode
18
"""
19
20
def decode(self, code: Union[dict, decodetree]) -> Iterator:
21
"""
22
Decode bitarray using variable-length prefix codes.
23
24
Args:
25
code: Dictionary or decodetree for decoding
26
27
Yields:
28
Decoded objects from the bitarray
29
"""
30
```
31
32
**Usage Examples:**
33
34
```python
35
from bitarray import bitarray
36
37
# Create encoding dictionary
38
code = {
39
'A': bitarray('00'),
40
'B': bitarray('01'),
41
'C': bitarray('10'),
42
'D': bitarray('11')
43
}
44
45
# Encoding
46
a = bitarray()
47
a.encode(code, 'ABCD') # Encodes to '00011011'
48
a.encode(code, ['A', 'C']) # Append 'AC' -> '0010'
49
50
# Decoding
51
decoded = list(a.decode(code)) # ['A', 'B', 'C', 'D', 'A', 'C']
52
53
# More complex example with text
54
text_code = {
55
' ': bitarray('00'),
56
'e': bitarray('010'),
57
't': bitarray('011'),
58
'a': bitarray('100'),
59
'o': bitarray('101'),
60
'i': bitarray('110'),
61
'n': bitarray('1110'),
62
's': bitarray('1111')
63
}
64
65
message = bitarray()
66
message.encode(text_code, "tea")
67
decoded_text = ''.join(message.decode(text_code)) # "tea"
68
```
69
70
### Decode Trees
71
72
The decodetree class provides an optimized structure for decoding variable-length prefix codes, offering better performance than dictionary-based decoding for large code tables.
73
74
```python { .api }
75
class decodetree:
76
"""Optimized tree structure for decoding variable-length prefix codes"""
77
78
def __init__(self, code: dict) -> None:
79
"""
80
Create decode tree from code dictionary.
81
82
Args:
83
code: Dictionary mapping objects to bitarray codes
84
"""
85
86
def complete(self) -> bool:
87
"""
88
Check if decode tree is complete (prefix-free).
89
90
Returns:
91
True if tree represents a complete prefix code
92
"""
93
94
def nodes(self) -> int:
95
"""
96
Get number of internal nodes in tree.
97
98
Returns:
99
Number of internal nodes
100
"""
101
102
def todict(self) -> dict:
103
"""
104
Convert decode tree back to dictionary format.
105
106
Returns:
107
Dictionary mapping bitarray codes to objects
108
"""
109
```
110
111
**Usage Examples:**
112
113
```python
114
from bitarray import bitarray, decodetree
115
116
# Create code dictionary
117
code = {
118
'frequent': bitarray('0'),
119
'common': bitarray('10'),
120
'rare': bitarray('110'),
121
'very_rare': bitarray('111')
122
}
123
124
# Create optimized decode tree
125
tree = decodetree(code)
126
127
# Tree analysis
128
print(tree.complete()) # True (complete prefix code)
129
print(tree.nodes()) # Number of internal nodes
130
131
# Efficient decoding with tree
132
message = bitarray('0101110')
133
decoded = list(message.decode(tree)) # ['frequent', 'common', 'very_rare']
134
135
# Convert back to dictionary if needed
136
code_dict = tree.todict() # Inverse mapping
137
```
138
139
### Huffman Coding
140
141
Functions for generating and using Huffman codes, which provide optimal variable-length encoding for known symbol frequencies.
142
143
```python { .api }
144
def huffman_code(freq_map: Union[dict, Counter], endian: Optional[str] = None) -> dict:
145
"""
146
Generate Huffman codes from frequency map.
147
148
Args:
149
freq_map: Dictionary or Counter mapping symbols to frequencies
150
endian: Bit-endianness for generated codes
151
152
Returns:
153
Dictionary mapping symbols to bitarray codes
154
"""
155
156
def canonical_huffman(freq_map: Union[dict, Counter]) -> tuple[dict, list, list]:
157
"""
158
Generate canonical Huffman codes.
159
160
Args:
161
freq_map: Dictionary or Counter mapping symbols to frequencies
162
163
Returns:
164
Tuple of (code_dict, count_list, symbol_list) for canonical encoding
165
"""
166
167
def canonical_decode(a: bitarray, count: list[int], symbol: list) -> Iterator:
168
"""
169
Decode using canonical Huffman codes.
170
171
Args:
172
a: Bitarray to decode
173
count: List of code counts by length (from canonical_huffman)
174
symbol: List of symbols in canonical order (from canonical_huffman)
175
176
Yields:
177
Decoded symbols
178
"""
179
```
180
181
**Usage Examples:**
182
183
```python
184
from bitarray import bitarray
185
from bitarray.util import huffman_code, canonical_huffman, canonical_decode
186
from collections import Counter
187
188
# Character frequencies in English text
189
frequencies = {
190
'e': 127, 't': 90, 'a': 82, 'o': 75, 'i': 70, 'n': 67,
191
's': 63, 'h': 61, 'r': 60, 'd': 43, 'l': 40, 'c': 28,
192
'u': 28, 'm': 24, 'w': 23, 'f': 22, 'g': 20, 'y': 20,
193
'p': 19, 'b': 13, 'v': 10, 'k': 8, 'j': 2, 'x': 2,
194
'q': 1, 'z': 1
195
}
196
197
# Generate Huffman codes
198
code = huffman_code(frequencies)
199
200
# Most frequent characters get shorter codes
201
print(f"'e': {code['e'].to01()}") # Short code for 'e'
202
print(f"'z': {code['z'].to01()}") # Longer code for 'z'
203
204
# Encode text
205
text = "hello world"
206
encoded = bitarray()
207
encoded.encode(code, text)
208
209
# Decode back
210
decoded_text = ''.join(encoded.decode(code))
211
print(decoded_text == text) # True
212
213
# Canonical Huffman (standardized format)
214
canon_code, count, symbol = canonical_huffman(frequencies)
215
encoded_canon = bitarray()
216
encoded_canon.encode(canon_code, text)
217
218
# Canonical decoding
219
decoded_canon = list(canonical_decode(encoded_canon, count, symbol))
220
print(''.join(decoded_canon) == text) # True
221
```
222
223
### Advanced Encoding Techniques
224
225
Additional encoding and compression techniques for specialized use cases.
226
227
```python { .api }
228
# From utility module - already covered in detail in utility-functions.md
229
def sc_encode(a: bitarray) -> bytes:
230
"""Sparse compression - optimal for arrays with few set bits"""
231
232
def sc_decode(stream: Iterable[int]) -> bitarray:
233
"""Decode sparse-compressed data"""
234
235
def vl_encode(a: bitarray) -> bytes:
236
"""Variable-length encoding for general compression"""
237
238
def vl_decode(stream: Iterable[int], endian: Optional[str] = None) -> bitarray:
239
"""Variable-length decoding"""
240
```
241
242
### Practical Applications
243
244
Here are complete examples showing how these advanced features work together for real-world applications:
245
246
**Text Compression Example:**
247
248
```python
249
from bitarray import bitarray
250
from bitarray.util import huffman_code
251
from collections import Counter
252
253
def compress_text(text: str) -> tuple[bitarray, dict]:
254
"""Compress text using Huffman coding"""
255
# Analyze character frequencies
256
frequencies = Counter(text)
257
258
# Generate optimal codes
259
code = huffman_code(frequencies)
260
261
# Encode text
262
compressed = bitarray()
263
compressed.encode(code, text)
264
265
return compressed, code
266
267
def decompress_text(compressed: bitarray, code: dict) -> str:
268
"""Decompress Huffman-coded text"""
269
return ''.join(compressed.decode(code))
270
271
# Example usage
272
original_text = "this is a test message for compression"
273
compressed_bits, encoding = compress_text(original_text)
274
decompressed_text = decompress_text(compressed_bits, encoding)
275
276
print(f"Original: {len(original_text * 8)} bits") # 8 bits per ASCII char
277
print(f"Compressed: {len(compressed_bits)} bits")
278
print(f"Compression ratio: {len(compressed_bits) / (len(original_text) * 8):.2f}")
279
print(f"Match: {original_text == decompressed_text}")
280
```
281
282
**Network Protocol Example:**
283
284
```python
285
from bitarray import bitarray, decodetree
286
287
def create_protocol_decoder():
288
"""Create decoder for a hypothetical network protocol"""
289
# Define protocol message codes
290
protocol_codes = {
291
'START': bitarray('000'),
292
'DATA': bitarray('001'),
293
'ACK': bitarray('010'),
294
'NACK': bitarray('011'),
295
'END': bitarray('100'),
296
'ERROR': bitarray('101')
297
}
298
299
# Create optimized decoder tree
300
return decodetree(protocol_codes)
301
302
def encode_message(commands: list[str]) -> bitarray:
303
"""Encode protocol message"""
304
code = {
305
'START': bitarray('000'), 'DATA': bitarray('001'),
306
'ACK': bitarray('010'), 'NACK': bitarray('011'),
307
'END': bitarray('100'), 'ERROR': bitarray('101')
308
}
309
310
message = bitarray()
311
message.encode(code, commands)
312
return message
313
314
def decode_message(message: bitarray, decoder: decodetree) -> list[str]:
315
"""Decode protocol message"""
316
return list(message.decode(decoder))
317
318
# Example protocol usage
319
decoder = create_protocol_decoder()
320
commands = ['START', 'DATA', 'DATA', 'ACK', 'END']
321
encoded = encode_message(commands)
322
decoded = decode_message(encoded, decoder)
323
324
print(f"Commands: {commands}")
325
print(f"Encoded: {encoded.to01()}")
326
print(f"Decoded: {decoded}")
327
print(f"Match: {commands == decoded}")
328
```
329
330
**Data Stream Processing:**
331
332
```python
333
from bitarray import bitarray
334
from bitarray.util import serialize, deserialize
335
336
class BitStreamProcessor:
337
"""Process streams of bit data with encoding/decoding"""
338
339
def __init__(self, chunk_size: int = 1024):
340
self.chunk_size = chunk_size
341
self.buffer = bitarray()
342
343
def add_data(self, data: bitarray) -> None:
344
"""Add data to processing buffer"""
345
self.buffer.extend(data)
346
347
def process_chunks(self, decoder: decodetree) -> list:
348
"""Process complete chunks from buffer"""
349
results = []
350
351
while len(self.buffer) >= self.chunk_size:
352
# Extract chunk
353
chunk = self.buffer[:self.chunk_size]
354
self.buffer = self.buffer[self.chunk_size:]
355
356
# Process chunk
357
decoded = list(chunk.decode(decoder))
358
results.extend(decoded)
359
360
return results
361
362
def save_state(self) -> bytes:
363
"""Serialize current buffer state"""
364
return serialize(self.buffer)
365
366
def restore_state(self, state: bytes) -> None:
367
"""Restore buffer from serialized state"""
368
self.buffer = deserialize(state)
369
370
# Example usage
371
processor = BitStreamProcessor(chunk_size=32)
372
decoder = create_protocol_decoder()
373
374
# Simulate streaming data
375
stream_data = encode_message(['START', 'DATA'] * 10)
376
processor.add_data(stream_data)
377
378
# Process available chunks
379
results = processor.process_chunks(decoder)
380
print(f"Processed {len(results)} commands")
381
382
# Save and restore state
383
state = processor.save_state()
384
processor.restore_state(state)
385
```
386
387
## Performance Considerations
388
389
Advanced features are optimized for different use cases:
390
391
- **decodetree**: Faster than dictionary decoding for large code tables
392
- **Huffman coding**: Optimal compression for known symbol frequencies
393
- **Canonical Huffman**: Standardized format for interoperability
394
- **Variable-length encoding**: General-purpose compression
395
- **Sparse compression**: Optimal for arrays with few set bits
396
397
Choose the appropriate technique based on your data characteristics and performance requirements.