or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-processing.mdcolor-management.mdimage-formats.mdimage-io.mdindex.mdlossless-compression.mdscientific-compression.mdutilities.md

array-processing.mddocs/

0

# Array Processing

1

2

Utilities for array transformation, bit manipulation, byte shuffling, and data preparation for compression algorithms. These functions optimize data layout and remove redundancy to improve compression efficiency or prepare data for specific processing requirements.

3

4

## Capabilities

5

6

### Delta Encoding

7

8

Compute differences between adjacent elements to remove trends and improve compressibility.

9

10

```python { .api }

11

def delta_encode(data, *, axis=-1, dist=1, out=None):

12

"""

13

Return delta encoded data.

14

15

Parameters:

16

- data: NDArray - Input array to encode (any numeric dtype)

17

- axis: int - Axis along which to compute differences (default -1, last axis)

18

- dist: int - Distance for delta computation (default 1, adjacent elements)

19

- out: NDArray | None - Pre-allocated output buffer (same shape as input)

20

21

Returns:

22

NDArray: Delta encoded array (first element unchanged, rest are differences)

23

"""

24

25

def delta_decode(data, *, axis=-1, dist=1, out=None):

26

"""

27

Return delta decoded data.

28

29

Parameters:

30

- data: NDArray - Delta encoded array

31

- axis: int - Axis along which delta was computed (default -1)

32

- dist: int - Distance used for delta computation (default 1)

33

- out: NDArray | None - Pre-allocated output buffer

34

35

Returns:

36

NDArray: Decoded array (reconstructed from differences)

37

"""

38

39

def delta_check(data):

40

"""

41

Check if data is delta encoded.

42

43

Parameters:

44

- data: bytes | bytearray | mmap.mmap | NDArray - Data to check

45

46

Returns:

47

None: Always returns None (delta is a transform, not a format)

48

"""

49

```

50

51

### Bit Shuffling

52

53

Reorganize bits to group similar bit positions together, improving compression of typed data.

54

55

```python { .api }

56

def bitshuffle_encode(data, *, itemsize=1, blocksize=0, out=None):

57

"""

58

Return bit-shuffled data.

59

60

Parameters:

61

- data: bytes | bytearray | mmap.mmap | NDArray - Input data

62

- itemsize: int - Size of data items in bytes (default 1)

63

Common values: 1 (uint8), 2 (uint16), 4 (uint32/float32), 8 (uint64/float64)

64

- blocksize: int - Block size for shuffling in bytes (default 0 = auto-determine)

65

- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer

66

67

Returns:

68

bytes | bytearray | NDArray: Bit-shuffled data

69

"""

70

71

def bitshuffle_decode(data, *, itemsize=1, blocksize=0, out=None):

72

"""

73

Return un-bit-shuffled data.

74

75

Parameters:

76

- data: bytes | bytearray | mmap.mmap | NDArray - Bit-shuffled data

77

- itemsize: int - Size of data items in bytes (must match encoding)

78

- blocksize: int - Block size used for shuffling (must match encoding)

79

- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer

80

81

Returns:

82

bytes | bytearray | NDArray: Reconstructed data

83

"""

84

85

def bitshuffle_check(data):

86

"""

87

Check if data is bit-shuffled.

88

89

Parameters:

90

- data: bytes | bytearray | mmap.mmap - Data to check

91

92

Returns:

93

bool | None: True if bitshuffle signature detected

94

"""

95

```

96

97

### Byte Shuffling

98

99

Reorder bytes to group similar byte positions together, useful for multi-byte data types.

100

101

```python { .api }

102

def byteshuffle_encode(data, *, axis=-1, dist=1, delta=False, reorder=False, out=None):

103

"""

104

Return byte-shuffled data.

105

106

Parameters:

107

- data: NDArray - Input array to shuffle

108

- axis: int - Axis along which to shuffle (default -1)

109

- dist: int - Distance for shuffling pattern (default 1)

110

- delta: bool - Apply delta encoding before shuffling (default False)

111

- reorder: bool - Reorder dimensions for better locality (default False)

112

- out: NDArray | None - Pre-allocated output buffer

113

114

Returns:

115

NDArray: Byte-shuffled array

116

"""

117

118

def byteshuffle_decode(data, *, axis=-1, dist=1, delta=False, reorder=False, out=None):

119

"""

120

Return un-byte-shuffled data.

121

122

Parameters:

123

- data: NDArray - Byte-shuffled array

124

- axis: int - Axis along which shuffling was applied (default -1)

125

- dist: int - Distance used for shuffling (default 1)

126

- delta: bool - Reverse delta encoding after unshuffling (default False)

127

- reorder: bool - Reverse dimension reordering (default False)

128

- out: NDArray | None - Pre-allocated output buffer

129

130

Returns:

131

NDArray: Reconstructed array

132

"""

133

134

def byteshuffle_check(data):

135

"""

136

Check if data is byte-shuffled.

137

138

Parameters:

139

- data: bytes | bytearray | mmap.mmap - Data to check

140

141

Returns:

142

None: Always returns None (byte shuffle is a transform, not a format)

143

"""

144

```

145

146

### Integer Packing

147

148

Pack integer arrays by removing unused high-order bits to reduce storage requirements.

149

150

```python { .api }

151

def packints_encode(data, *, out=None):

152

"""

153

Return packed integer array.

154

155

Parameters:

156

- data: NDArray - Integer array to pack (uint8, uint16, uint32, uint64)

157

- out: NDArray | None - Pre-allocated output buffer

158

159

Returns:

160

NDArray: Packed integer data with reduced bit width

161

"""

162

163

def packints_decode(data, dtype=None, *, out=None):

164

"""

165

Return unpacked integer array.

166

167

Parameters:

168

- data: NDArray - Packed integer data

169

- dtype: numpy.dtype | None - Target dtype for unpacking (required)

170

- out: NDArray | None - Pre-allocated output buffer

171

172

Returns:

173

NDArray: Unpacked integer array

174

"""

175

176

def packints_check(data):

177

"""

178

Check if data is packed integers.

179

180

Parameters:

181

- data: bytes | bytearray | mmap.mmap - Data to check

182

183

Returns:

184

None: Always returns None (packints is a transform, not a format)

185

"""

186

```

187

188

### PackBits Compression

189

190

Simple run-length encoding compression used in TIFF and other formats.

191

192

```python { .api }

193

def packbits_encode(data, *, out=None):

194

"""

195

Return PackBits encoded data.

196

197

Parameters:

198

- data: bytes | bytearray | mmap.mmap - Input data to encode

199

- out: bytes | bytearray | None - Pre-allocated output buffer

200

201

Returns:

202

bytes | bytearray: PackBits encoded data

203

"""

204

205

def packbits_decode(data, *, out=None):

206

"""

207

Return PackBits decoded data.

208

209

Parameters:

210

- data: bytes | bytearray | mmap.mmap - PackBits encoded data

211

- out: bytes | bytearray | None - Pre-allocated output buffer

212

213

Returns:

214

bytes | bytearray: Decoded data

215

"""

216

217

def packbits_check(data):

218

"""

219

Check if data is PackBits encoded.

220

221

Parameters:

222

- data: bytes | bytearray | mmap.mmap - Data to check

223

224

Returns:

225

None: Always returns None (no reliable magic number)

226

"""

227

```

228

229

### XOR Encoding

230

231

Apply XOR transformation to remove correlation between adjacent values.

232

233

```python { .api }

234

def xor_encode(data, *, out=None):

235

"""

236

Return XOR encoded data.

237

238

Parameters:

239

- data: NDArray - Input array to encode (integer types)

240

- out: NDArray | None - Pre-allocated output buffer

241

242

Returns:

243

NDArray: XOR encoded array

244

"""

245

246

def xor_decode(data, *, out=None):

247

"""

248

Return XOR decoded data.

249

250

Parameters:

251

- data: NDArray - XOR encoded array

252

- out: NDArray | None - Pre-allocated output buffer

253

254

Returns:

255

NDArray: Decoded array

256

"""

257

258

def xor_check(data):

259

"""

260

Check if data is XOR encoded.

261

262

Parameters:

263

- data: bytes | bytearray | mmap.mmap - Data to check

264

265

Returns:

266

None: Always returns None (XOR is a transform, not a format)

267

"""

268

```

269

270

### Bit Order Reversal

271

272

Reverse the bit order within bytes for compatibility with different endianness or protocols.

273

274

```python { .api }

275

def bitorder_encode(data, *, out=None):

276

"""

277

Return data with reversed bit-order.

278

279

Parameters:

280

- data: bytes | bytearray | mmap.mmap | NDArray - Input data

281

- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer

282

283

Returns:

284

bytes | bytearray | NDArray: Data with bits reversed in each byte

285

"""

286

287

def bitorder_decode(data, *, out=None):

288

"""

289

Return data with restored bit-order (same as encode).

290

291

Parameters:

292

- data: bytes | bytearray | mmap.mmap | NDArray - Bit-reversed data

293

- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer

294

295

Returns:

296

bytes | bytearray | NDArray: Data with original bit order

297

"""

298

299

def bitorder_check(data):

300

"""

301

Check if data has reversed bit-order.

302

303

Parameters:

304

- data: bytes | bytearray | mmap.mmap - Data to check

305

306

Returns:

307

None: Always returns None (bit order reversal is a transform)

308

"""

309

```

310

311

### Quantization

312

313

Reduce the precision of floating-point data by quantizing to fewer levels.

314

315

```python { .api }

316

def quantize_encode(data, *, levels=None, out=None):

317

"""

318

Return quantized data.

319

320

Parameters:

321

- data: NDArray - Floating-point data to quantize

322

- levels: int | None - Number of quantization levels (default 256)

323

- out: NDArray | None - Pre-allocated output buffer

324

325

Returns:

326

NDArray: Quantized data (typically integer type)

327

"""

328

329

def quantize_decode(data, *, levels=None, out=None):

330

"""

331

Return dequantized data.

332

333

Parameters:

334

- data: NDArray - Quantized data

335

- levels: int | None - Number of quantization levels used (default 256)

336

- out: NDArray | None - Pre-allocated output buffer

337

338

Returns:

339

NDArray: Dequantized floating-point data

340

"""

341

342

def quantize_check(data):

343

"""

344

Check if data is quantized.

345

346

Parameters:

347

- data: bytes | bytearray | mmap.mmap - Data to check

348

349

Returns:

350

None: Always returns None (quantization is a transform)

351

"""

352

```

353

354

## Usage Examples

355

356

### Image Data Preprocessing

357

358

```python

359

import imagecodecs

360

import numpy as np

361

362

# Simulate 16-bit sensor data

363

sensor_data = np.random.randint(0, 65536, (1024, 1024), dtype=np.uint16)

364

365

# Apply delta encoding to remove gradients

366

delta_encoded = imagecodecs.delta_encode(sensor_data, axis=1) # Row-wise differences

367

368

# Apply bit shuffling optimized for 16-bit data

369

bit_shuffled = imagecodecs.bitshuffle_encode(

370

delta_encoded,

371

itemsize=2, # 16-bit = 2 bytes

372

blocksize=8192 # 8KB blocks

373

)

374

375

# Compress the preprocessed data

376

compressed = imagecodecs.zlib_encode(bit_shuffled.tobytes(), level=9)

377

378

# Compare with direct compression

379

direct_compressed = imagecodecs.zlib_encode(sensor_data.tobytes(), level=9)

380

381

print(f"Original size: {sensor_data.nbytes} bytes")

382

print(f"Direct compression: {len(direct_compressed)} bytes ({len(direct_compressed)/sensor_data.nbytes:.2%})")

383

print(f"Preprocessed compression: {len(compressed)} bytes ({len(compressed)/sensor_data.nbytes:.2%})")

384

print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")

385

386

# Decompress and decode

387

decompressed_bytes = imagecodecs.zlib_decode(compressed)

388

decompressed_array = np.frombuffer(decompressed_bytes, dtype=np.uint16).reshape(sensor_data.shape)

389

bit_unshuffled = imagecodecs.bitshuffle_decode(decompressed_array, itemsize=2, blocksize=8192)

390

reconstructed = imagecodecs.delta_decode(bit_unshuffled, axis=1)

391

392

assert np.array_equal(sensor_data, reconstructed)

393

```

394

395

### Scientific Data Optimization

396

397

```python

398

import imagecodecs

399

import numpy as np

400

401

# Simulate time-series scientific measurements

402

time_points, sensors = 10000, 128

403

measurements = np.cumsum(np.random.normal(0, 0.1, (time_points, sensors)), axis=0).astype(np.float32)

404

405

# Apply floating-point predictor along time axis

406

predicted = imagecodecs.floatpred_encode(measurements, axis=0)

407

408

# Apply byte shuffling for better compression

409

shuffled = imagecodecs.byteshuffle_encode(predicted, axis=1, delta=False)

410

411

# Compress with high-performance algorithm

412

compressed = imagecodecs.blosc_encode(

413

shuffled.tobytes(),

414

level=5,

415

compressor='zstd',

416

shuffle=1, # Additional byte shuffle at BLOSC level

417

typesize=4, # float32 = 4 bytes

418

numthreads=4

419

)

420

421

print(f"Original: {measurements.nbytes} bytes")

422

print(f"Compressed: {len(compressed)} bytes ({len(compressed)/measurements.nbytes:.2%})")

423

424

# Decompress and reconstruct

425

decompressed_bytes = imagecodecs.blosc_decode(compressed, numthreads=4)

426

decompressed_array = np.frombuffer(decompressed_bytes, dtype=np.float32).reshape(measurements.shape)

427

unshuffled = imagecodecs.byteshuffle_decode(decompressed_array, axis=1, delta=False)

428

reconstructed = imagecodecs.floatpred_decode(unshuffled, axis=0)

429

430

# Verify exact reconstruction

431

assert np.allclose(measurements, reconstructed, rtol=1e-7, atol=1e-7)

432

```

433

434

### Integer Data Optimization

435

436

```python

437

import imagecodecs

438

import numpy as np

439

440

# Simulate sparse integer data (many small values)

441

data = np.random.choice([0, 1, 2, 3, 4, 255, 65535], size=(1000, 1000),

442

p=[0.4, 0.2, 0.15, 0.1, 0.1, 0.04, 0.01]).astype(np.uint16)

443

444

# Pack integers to remove unused high bits

445

packed = imagecodecs.packints_encode(data)

446

print(f"Original dtype: {data.dtype}, packed dtype: {packed.dtype}")

447

448

# Apply XOR encoding to remove correlation

449

xor_encoded = imagecodecs.xor_encode(packed)

450

451

# Apply run-length encoding for sparse data

452

packbits_compressed = imagecodecs.packbits_encode(xor_encoded.tobytes())

453

454

print(f"Original: {data.nbytes} bytes")

455

print(f"After packing: {packed.nbytes} bytes")

456

print(f"After PackBits: {len(packbits_compressed)} bytes")

457

print(f"Total compression: {data.nbytes / len(packbits_compressed):.1f}x")

458

459

# Reconstruct

460

packbits_decompressed = imagecodecs.packbits_decode(packbits_compressed)

461

packed_array = np.frombuffer(packbits_decompressed, dtype=packed.dtype).reshape(packed.shape)

462

xor_decoded = imagecodecs.xor_decode(packed_array)

463

unpacked = imagecodecs.packints_decode(xor_decoded, dtype=data.dtype)

464

465

assert np.array_equal(data, unpacked)

466

```

467

468

### Multi-dimensional Data Processing

469

470

```python

471

import imagecodecs

472

import numpy as np

473

474

# 3D medical or scientific dataset

475

depth, height, width = 64, 512, 512

476

volume = np.random.randint(0, 4096, (depth, height, width), dtype=np.uint16)

477

478

# Apply delta encoding along different axes

479

z_delta = imagecodecs.delta_encode(volume, axis=0) # Slice-to-slice differences

480

xy_delta = imagecodecs.delta_encode(z_delta, axis=2) # Column differences

481

482

# Byte shuffle optimized for 3D data

483

shuffled = imagecodecs.byteshuffle_encode(xy_delta, axis=1, reorder=True)

484

485

# Compress with algorithm suitable for 3D data

486

compressed = imagecodecs.lzma_encode(shuffled.tobytes(), level=6)

487

488

print(f"3D volume: {volume.shape}")

489

print(f"Original: {volume.nbytes} bytes")

490

print(f"Compressed: {len(compressed)} bytes ({len(compressed)/volume.nbytes:.2%})")

491

492

# Reconstruct

493

decompressed_bytes = imagecodecs.lzma_decode(compressed)

494

decompressed_array = np.frombuffer(decompressed_bytes, dtype=volume.dtype).reshape(volume.shape)

495

unshuffled = imagecodecs.byteshuffle_decode(decompressed_array, axis=1, reorder=True)

496

xy_reconstructed = imagecodecs.delta_decode(unshuffled, axis=2)

497

z_reconstructed = imagecodecs.delta_decode(xy_reconstructed, axis=0)

498

499

assert np.array_equal(volume, z_reconstructed)

500

```

501

502

### Quantization for Lossy Compression

503

504

```python

505

import imagecodecs

506

import numpy as np

507

508

# High-precision floating-point data

509

data = np.random.normal(0, 1, (256, 256)).astype(np.float64)

510

511

# Quantize to reduce precision

512

quantized = imagecodecs.quantize_encode(data, levels=1024) # 10-bit quantization

513

print(f"Original dtype: {data.dtype}, quantized dtype: {quantized.dtype}")

514

515

# Compress quantized data (integers compress better)

516

compressed = imagecodecs.zlib_encode(quantized.tobytes(), level=9)

517

518

# Compare with direct float compression

519

direct_compressed = imagecodecs.zlib_encode(data.tobytes(), level=9)

520

521

print(f"Original: {data.nbytes} bytes")

522

print(f"Direct compression: {len(direct_compressed)} bytes")

523

print(f"Quantized compression: {len(compressed)} bytes")

524

print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")

525

526

# Reconstruct (lossy)

527

decompressed_bytes = imagecodecs.zlib_decode(compressed)

528

quantized_restored = np.frombuffer(decompressed_bytes, dtype=quantized.dtype).reshape(data.shape)

529

dequantized = imagecodecs.quantize_decode(quantized_restored, levels=1024)

530

531

# Measure quantization error

532

max_error = np.max(np.abs(data - dequantized))

533

mse = np.mean((data - dequantized) ** 2)

534

print(f"Max quantization error: {max_error:.6f}")

535

print(f"MSE: {mse:.6f}")

536

```

537

538

## Performance Considerations

539

540

### Transform Selection

541

- **Delta encoding**: Best for data with trends or gradients

542

- **Bit shuffling**: Optimal for typed numerical data before compression

543

- **Byte shuffling**: Good for multi-byte data types and multi-dimensional arrays

544

- **PackBits**: Effective for sparse data with runs of identical values

545

- **XOR encoding**: Removes correlation between adjacent integer values

546

- **Quantization**: Trade precision for compression ratio

547

548

### Optimization Guidelines

549

- Chain transforms for maximum benefit (e.g., delta → shuffle → compress)

550

- Match itemsize parameter to your data type for bit/byte shuffling

551

- Use appropriate axis for delta encoding based on data structure

552

- Consider data distribution when choosing quantization levels

553

- Pre-allocate output buffers for large datasets

554

555

### Memory Management

556

- Transforms are typically in-place where possible

557

- Use appropriate block sizes for bit shuffling with large datasets

558

- Consider memory usage when chaining multiple transforms

559

560

## Constants and Configuration

561

562

### Bit Shuffle Constants

563

564

```python { .api }

565

class BITSHUFFLE:

566

available: bool

567

568

# Common item sizes

569

ITEMSIZE_UINT8 = 1

570

ITEMSIZE_UINT16 = 2

571

ITEMSIZE_UINT32 = 4

572

ITEMSIZE_UINT64 = 8

573

ITEMSIZE_FLOAT32 = 4

574

ITEMSIZE_FLOAT64 = 8

575

```

576

577

### Delta Encoding Constants

578

579

```python { .api }

580

class DELTA:

581

available: bool = True # Pure Python implementation always available

582

583

# Common distance values

584

DISTANCE_ADJACENT = 1 # Adjacent elements

585

DISTANCE_ROW = None # Width of 2D array (context-dependent)

586

DISTANCE_PLANE = None # Area of 2D slice in 3D array

587

```

588

589

## Error Handling

590

591

All array processing functions use the base `ImcdError` exception class:

592

593

```python { .api }

594

class ImcdError(Exception):

595

"""Base IMCD codec exception."""

596

597

# Specific aliases for array processing

598

DeltaError = ImcdError

599

BitshuffleError = Exception # Uses standard bitshuffle exceptions

600

ByteshuffleError = ImcdError

601

PackintsError = ImcdError

602

PackbitsError = ImcdError

603

XorError = ImcdError

604

BitorderError = ImcdError

605

QuantizeError = ImcdError

606

```