or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-features.mdcore-operations.mdcreation-conversion.mdindex.mdutility-functions.md

advanced-features.mddocs/

0

# Advanced Features

1

2

Specialized functionality for encoding/decoding variable-length prefix codes, Huffman coding, compression algorithms, and advanced bit manipulation techniques.

3

4

## Capabilities

5

6

### Variable-Length Prefix Coding

7

8

Bitarray provides efficient support for encoding and decoding variable-length prefix codes, commonly used in compression algorithms and data transmission protocols.

9

10

```python { .api }

11

def encode(self, code: dict, iterable) -> None:

12

"""

13

Encode objects using variable-length prefix codes and append to bitarray.

14

15

Args:

16

code: Dictionary mapping objects to bitarray codes

17

iterable: Sequence of objects to encode

18

"""

19

20

def decode(self, code: Union[dict, decodetree]) -> Iterator:

21

"""

22

Decode bitarray using variable-length prefix codes.

23

24

Args:

25

code: Dictionary or decodetree for decoding

26

27

Yields:

28

Decoded objects from the bitarray

29

"""

30

```

31

32

**Usage Examples:**

33

34

```python

35

from bitarray import bitarray

36

37

# Create encoding dictionary

38

code = {

39

'A': bitarray('00'),

40

'B': bitarray('01'),

41

'C': bitarray('10'),

42

'D': bitarray('11')

43

}

44

45

# Encoding

46

a = bitarray()

47

a.encode(code, 'ABCD') # Encodes to '00011011'

48

a.encode(code, ['A', 'C']) # Append 'AC' -> '0010'

49

50

# Decoding

51

decoded = list(a.decode(code)) # ['A', 'B', 'C', 'D', 'A', 'C']

52

53

# More complex example with text

54

text_code = {

55

' ': bitarray('00'),

56

'e': bitarray('010'),

57

't': bitarray('011'),

58

'a': bitarray('100'),

59

'o': bitarray('101'),

60

'i': bitarray('110'),

61

'n': bitarray('1110'),

62

's': bitarray('1111')

63

}

64

65

message = bitarray()

66

message.encode(text_code, "tea")

67

decoded_text = ''.join(message.decode(text_code)) # "tea"

68

```

69

70

### Decode Trees

71

72

The decodetree class provides an optimized structure for decoding variable-length prefix codes, offering better performance than dictionary-based decoding for large code tables.

73

74

```python { .api }

75

class decodetree:

76

"""Optimized tree structure for decoding variable-length prefix codes"""

77

78

def __init__(self, code: dict) -> None:

79

"""

80

Create decode tree from code dictionary.

81

82

Args:

83

code: Dictionary mapping objects to bitarray codes

84

"""

85

86

def complete(self) -> bool:

87

"""

88

Check if decode tree is complete (prefix-free).

89

90

Returns:

91

True if tree represents a complete prefix code

92

"""

93

94

def nodes(self) -> int:

95

"""

96

Get number of internal nodes in tree.

97

98

Returns:

99

Number of internal nodes

100

"""

101

102

def todict(self) -> dict:

103

"""

104

Convert decode tree back to dictionary format.

105

106

Returns:

107

Dictionary mapping bitarray codes to objects

108

"""

109

```

110

111

**Usage Examples:**

112

113

```python

114

from bitarray import bitarray, decodetree

115

116

# Create code dictionary

117

code = {

118

'frequent': bitarray('0'),

119

'common': bitarray('10'),

120

'rare': bitarray('110'),

121

'very_rare': bitarray('111')

122

}

123

124

# Create optimized decode tree

125

tree = decodetree(code)

126

127

# Tree analysis

128

print(tree.complete()) # True (complete prefix code)

129

print(tree.nodes()) # Number of internal nodes

130

131

# Efficient decoding with tree

132

message = bitarray('0101110')

133

decoded = list(message.decode(tree)) # ['frequent', 'common', 'very_rare']

134

135

# Convert back to dictionary if needed

136

code_dict = tree.todict() # Inverse mapping

137

```

138

139

### Huffman Coding

140

141

Functions for generating and using Huffman codes, which provide optimal variable-length encoding for known symbol frequencies.

142

143

```python { .api }

144

def huffman_code(freq_map: Union[dict, Counter], endian: Optional[str] = None) -> dict:

145

"""

146

Generate Huffman codes from frequency map.

147

148

Args:

149

freq_map: Dictionary or Counter mapping symbols to frequencies

150

endian: Bit-endianness for generated codes

151

152

Returns:

153

Dictionary mapping symbols to bitarray codes

154

"""

155

156

def canonical_huffman(freq_map: Union[dict, Counter]) -> tuple[dict, list, list]:

157

"""

158

Generate canonical Huffman codes.

159

160

Args:

161

freq_map: Dictionary or Counter mapping symbols to frequencies

162

163

Returns:

164

Tuple of (code_dict, count_list, symbol_list) for canonical encoding

165

"""

166

167

def canonical_decode(a: bitarray, count: list[int], symbol: list) -> Iterator:

168

"""

169

Decode using canonical Huffman codes.

170

171

Args:

172

a: Bitarray to decode

173

count: List of code counts by length (from canonical_huffman)

174

symbol: List of symbols in canonical order (from canonical_huffman)

175

176

Yields:

177

Decoded symbols

178

"""

179

```

180

181

**Usage Examples:**

182

183

```python

184

from bitarray import bitarray

185

from bitarray.util import huffman_code, canonical_huffman, canonical_decode

186

from collections import Counter

187

188

# Character frequencies in English text

189

frequencies = {

190

'e': 127, 't': 90, 'a': 82, 'o': 75, 'i': 70, 'n': 67,

191

's': 63, 'h': 61, 'r': 60, 'd': 43, 'l': 40, 'c': 28,

192

'u': 28, 'm': 24, 'w': 23, 'f': 22, 'g': 20, 'y': 20,

193

'p': 19, 'b': 13, 'v': 10, 'k': 8, 'j': 2, 'x': 2,

194

'q': 1, 'z': 1

195

}

196

197

# Generate Huffman codes

198

code = huffman_code(frequencies)

199

200

# Most frequent characters get shorter codes

201

print(f"'e': {code['e'].to01()}") # Short code for 'e'

202

print(f"'z': {code['z'].to01()}") # Longer code for 'z'

203

204

# Encode text

205

text = "hello world"

206

encoded = bitarray()

207

encoded.encode(code, text)

208

209

# Decode back

210

decoded_text = ''.join(encoded.decode(code))

211

print(decoded_text == text) # True

212

213

# Canonical Huffman (standardized format)

214

canon_code, count, symbol = canonical_huffman(frequencies)

215

encoded_canon = bitarray()

216

encoded_canon.encode(canon_code, text)

217

218

# Canonical decoding

219

decoded_canon = list(canonical_decode(encoded_canon, count, symbol))

220

print(''.join(decoded_canon) == text) # True

221

```

222

223

### Advanced Encoding Techniques

224

225

Additional encoding and compression techniques for specialized use cases.

226

227

```python { .api }

228

# From utility module - already covered in detail in utility-functions.md

229

def sc_encode(a: bitarray) -> bytes:

230

"""Sparse compression - optimal for arrays with few set bits"""

231

232

def sc_decode(stream: Iterable[int]) -> bitarray:

233

"""Decode sparse-compressed data"""

234

235

def vl_encode(a: bitarray) -> bytes:

236

"""Variable-length encoding for general compression"""

237

238

def vl_decode(stream: Iterable[int], endian: Optional[str] = None) -> bitarray:

239

"""Variable-length decoding"""

240

```

241

242

### Practical Applications

243

244

Here are complete examples showing how these advanced features work together for real-world applications:

245

246

**Text Compression Example:**

247

248

```python

249

from bitarray import bitarray

250

from bitarray.util import huffman_code

251

from collections import Counter

252

253

def compress_text(text: str) -> tuple[bitarray, dict]:

254

"""Compress text using Huffman coding"""

255

# Analyze character frequencies

256

frequencies = Counter(text)

257

258

# Generate optimal codes

259

code = huffman_code(frequencies)

260

261

# Encode text

262

compressed = bitarray()

263

compressed.encode(code, text)

264

265

return compressed, code

266

267

def decompress_text(compressed: bitarray, code: dict) -> str:

268

"""Decompress Huffman-coded text"""

269

return ''.join(compressed.decode(code))

270

271

# Example usage

272

original_text = "this is a test message for compression"

273

compressed_bits, encoding = compress_text(original_text)

274

decompressed_text = decompress_text(compressed_bits, encoding)

275

276

print(f"Original: {len(original_text * 8)} bits") # 8 bits per ASCII char

277

print(f"Compressed: {len(compressed_bits)} bits")

278

print(f"Compression ratio: {len(compressed_bits) / (len(original_text) * 8):.2f}")

279

print(f"Match: {original_text == decompressed_text}")

280

```

281

282

**Network Protocol Example:**

283

284

```python

285

from bitarray import bitarray, decodetree

286

287

def create_protocol_decoder():

288

"""Create decoder for a hypothetical network protocol"""

289

# Define protocol message codes

290

protocol_codes = {

291

'START': bitarray('000'),

292

'DATA': bitarray('001'),

293

'ACK': bitarray('010'),

294

'NACK': bitarray('011'),

295

'END': bitarray('100'),

296

'ERROR': bitarray('101')

297

}

298

299

# Create optimized decoder tree

300

return decodetree(protocol_codes)

301

302

def encode_message(commands: list[str]) -> bitarray:

303

"""Encode protocol message"""

304

code = {

305

'START': bitarray('000'), 'DATA': bitarray('001'),

306

'ACK': bitarray('010'), 'NACK': bitarray('011'),

307

'END': bitarray('100'), 'ERROR': bitarray('101')

308

}

309

310

message = bitarray()

311

message.encode(code, commands)

312

return message

313

314

def decode_message(message: bitarray, decoder: decodetree) -> list[str]:

315

"""Decode protocol message"""

316

return list(message.decode(decoder))

317

318

# Example protocol usage

319

decoder = create_protocol_decoder()

320

commands = ['START', 'DATA', 'DATA', 'ACK', 'END']

321

encoded = encode_message(commands)

322

decoded = decode_message(encoded, decoder)

323

324

print(f"Commands: {commands}")

325

print(f"Encoded: {encoded.to01()}")

326

print(f"Decoded: {decoded}")

327

print(f"Match: {commands == decoded}")

328

```

329

330

**Data Stream Processing:**

331

332

```python

333

from bitarray import bitarray

334

from bitarray.util import serialize, deserialize

335

336

class BitStreamProcessor:

337

"""Process streams of bit data with encoding/decoding"""

338

339

def __init__(self, chunk_size: int = 1024):

340

self.chunk_size = chunk_size

341

self.buffer = bitarray()

342

343

def add_data(self, data: bitarray) -> None:

344

"""Add data to processing buffer"""

345

self.buffer.extend(data)

346

347

def process_chunks(self, decoder: decodetree) -> list:

348

"""Process complete chunks from buffer"""

349

results = []

350

351

while len(self.buffer) >= self.chunk_size:

352

# Extract chunk

353

chunk = self.buffer[:self.chunk_size]

354

self.buffer = self.buffer[self.chunk_size:]

355

356

# Process chunk

357

decoded = list(chunk.decode(decoder))

358

results.extend(decoded)

359

360

return results

361

362

def save_state(self) -> bytes:

363

"""Serialize current buffer state"""

364

return serialize(self.buffer)

365

366

def restore_state(self, state: bytes) -> None:

367

"""Restore buffer from serialized state"""

368

self.buffer = deserialize(state)

369

370

# Example usage

371

processor = BitStreamProcessor(chunk_size=32)

372

decoder = create_protocol_decoder()

373

374

# Simulate streaming data

375

stream_data = encode_message(['START', 'DATA'] * 10)

376

processor.add_data(stream_data)

377

378

# Process available chunks

379

results = processor.process_chunks(decoder)

380

print(f"Processed {len(results)} commands")

381

382

# Save and restore state

383

state = processor.save_state()

384

processor.restore_state(state)

385

```

386

387

## Performance Considerations

388

389

Advanced features are optimized for different use cases:

390

391

- **decodetree**: Faster than dictionary decoding for large code tables

392

- **Huffman coding**: Optimal compression for known symbol frequencies

393

- **Canonical Huffman**: Standardized format for interoperability

394

- **Variable-length encoding**: General-purpose compression

395

- **Sparse compression**: Optimal for arrays with few set bits

396

397

Choose the appropriate technique based on your data characteristics and performance requirements.