0
# Array Processing
1
2
Utilities for array transformation, bit manipulation, byte shuffling, and data preparation for compression algorithms. These functions optimize data layout and remove redundancy to improve compression efficiency or prepare data for specific processing requirements.
3
4
## Capabilities
5
6
### Delta Encoding
7
8
Compute differences between adjacent elements to remove trends and improve compressibility.
9
10
```python { .api }
11
def delta_encode(data, *, axis=-1, dist=1, out=None):
12
"""
13
Return delta encoded data.
14
15
Parameters:
16
- data: NDArray - Input array to encode (any numeric dtype)
17
- axis: int - Axis along which to compute differences (default -1, last axis)
18
- dist: int - Distance for delta computation (default 1, adjacent elements)
19
- out: NDArray | None - Pre-allocated output buffer (same shape as input)
20
21
Returns:
22
NDArray: Delta encoded array (first element unchanged, rest are differences)
23
"""
24
25
def delta_decode(data, *, axis=-1, dist=1, out=None):
26
"""
27
Return delta decoded data.
28
29
Parameters:
30
- data: NDArray - Delta encoded array
31
- axis: int - Axis along which delta was computed (default -1)
32
- dist: int - Distance used for delta computation (default 1)
33
- out: NDArray | None - Pre-allocated output buffer
34
35
Returns:
36
NDArray: Decoded array (reconstructed from differences)
37
"""
38
39
def delta_check(data):
40
"""
41
Check if data is delta encoded.
42
43
Parameters:
44
- data: bytes | bytearray | mmap.mmap | NDArray - Data to check
45
46
Returns:
47
None: Always returns None (delta is a transform, not a format)
48
"""
49
```
50
51
### Bit Shuffling
52
53
Reorganize bits to group similar bit positions together, improving compression of typed data.
54
55
```python { .api }
56
def bitshuffle_encode(data, *, itemsize=1, blocksize=0, out=None):
57
"""
58
Return bit-shuffled data.
59
60
Parameters:
61
- data: bytes | bytearray | mmap.mmap | NDArray - Input data
62
- itemsize: int - Size of data items in bytes (default 1)
63
Common values: 1 (uint8), 2 (uint16), 4 (uint32/float32), 8 (uint64/float64)
64
- blocksize: int - Block size for shuffling in bytes (default 0 = auto-determine)
65
- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer
66
67
Returns:
68
bytes | bytearray | NDArray: Bit-shuffled data
69
"""
70
71
def bitshuffle_decode(data, *, itemsize=1, blocksize=0, out=None):
72
"""
73
Return un-bit-shuffled data.
74
75
Parameters:
76
- data: bytes | bytearray | mmap.mmap | NDArray - Bit-shuffled data
77
- itemsize: int - Size of data items in bytes (must match encoding)
78
- blocksize: int - Block size used for shuffling (must match encoding)
79
- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer
80
81
Returns:
82
bytes | bytearray | NDArray: Reconstructed data
83
"""
84
85
def bitshuffle_check(data):
86
"""
87
Check if data is bit-shuffled.
88
89
Parameters:
90
- data: bytes | bytearray | mmap.mmap - Data to check
91
92
Returns:
93
bool | None: True if bitshuffle signature detected
94
"""
95
```
96
97
### Byte Shuffling
98
99
Reorder bytes to group similar byte positions together, useful for multi-byte data types.
100
101
```python { .api }
102
def byteshuffle_encode(data, *, axis=-1, dist=1, delta=False, reorder=False, out=None):
103
"""
104
Return byte-shuffled data.
105
106
Parameters:
107
- data: NDArray - Input array to shuffle
108
- axis: int - Axis along which to shuffle (default -1)
109
- dist: int - Distance for shuffling pattern (default 1)
110
- delta: bool - Apply delta encoding before shuffling (default False)
111
- reorder: bool - Reorder dimensions for better locality (default False)
112
- out: NDArray | None - Pre-allocated output buffer
113
114
Returns:
115
NDArray: Byte-shuffled array
116
"""
117
118
def byteshuffle_decode(data, *, axis=-1, dist=1, delta=False, reorder=False, out=None):
119
"""
120
Return un-byte-shuffled data.
121
122
Parameters:
123
- data: NDArray - Byte-shuffled array
124
- axis: int - Axis along which shuffling was applied (default -1)
125
- dist: int - Distance used for shuffling (default 1)
126
- delta: bool - Reverse delta encoding after unshuffling (default False)
127
- reorder: bool - Reverse dimension reordering (default False)
128
- out: NDArray | None - Pre-allocated output buffer
129
130
Returns:
131
NDArray: Reconstructed array
132
"""
133
134
def byteshuffle_check(data):
135
"""
136
Check if data is byte-shuffled.
137
138
Parameters:
139
- data: bytes | bytearray | mmap.mmap - Data to check
140
141
Returns:
142
None: Always returns None (byte shuffle is a transform, not a format)
143
"""
144
```
145
146
### Integer Packing
147
148
Pack integer arrays by removing unused high-order bits to reduce storage requirements.
149
150
```python { .api }
151
def packints_encode(data, *, out=None):
152
"""
153
Return packed integer array.
154
155
Parameters:
156
- data: NDArray - Integer array to pack (uint8, uint16, uint32, uint64)
157
- out: NDArray | None - Pre-allocated output buffer
158
159
Returns:
160
NDArray: Packed integer data with reduced bit width
161
"""
162
163
def packints_decode(data, dtype=None, *, out=None):
164
"""
165
Return unpacked integer array.
166
167
Parameters:
168
- data: NDArray - Packed integer data
169
- dtype: numpy.dtype | None - Target dtype for unpacking (required)
170
- out: NDArray | None - Pre-allocated output buffer
171
172
Returns:
173
NDArray: Unpacked integer array
174
"""
175
176
def packints_check(data):
177
"""
178
Check if data is packed integers.
179
180
Parameters:
181
- data: bytes | bytearray | mmap.mmap - Data to check
182
183
Returns:
184
None: Always returns None (packints is a transform, not a format)
185
"""
186
```
187
188
### PackBits Compression
189
190
Simple run-length encoding compression used in TIFF and other formats.
191
192
```python { .api }
193
def packbits_encode(data, *, out=None):
194
"""
195
Return PackBits encoded data.
196
197
Parameters:
198
- data: bytes | bytearray | mmap.mmap - Input data to encode
199
- out: bytes | bytearray | None - Pre-allocated output buffer
200
201
Returns:
202
bytes | bytearray: PackBits encoded data
203
"""
204
205
def packbits_decode(data, *, out=None):
206
"""
207
Return PackBits decoded data.
208
209
Parameters:
210
- data: bytes | bytearray | mmap.mmap - PackBits encoded data
211
- out: bytes | bytearray | None - Pre-allocated output buffer
212
213
Returns:
214
bytes | bytearray: Decoded data
215
"""
216
217
def packbits_check(data):
218
"""
219
Check if data is PackBits encoded.
220
221
Parameters:
222
- data: bytes | bytearray | mmap.mmap - Data to check
223
224
Returns:
225
None: Always returns None (no reliable magic number)
226
"""
227
```
228
229
### XOR Encoding
230
231
Apply XOR transformation to remove correlation between adjacent values.
232
233
```python { .api }
234
def xor_encode(data, *, out=None):
235
"""
236
Return XOR encoded data.
237
238
Parameters:
239
- data: NDArray - Input array to encode (integer types)
240
- out: NDArray | None - Pre-allocated output buffer
241
242
Returns:
243
NDArray: XOR encoded array
244
"""
245
246
def xor_decode(data, *, out=None):
247
"""
248
Return XOR decoded data.
249
250
Parameters:
251
- data: NDArray - XOR encoded array
252
- out: NDArray | None - Pre-allocated output buffer
253
254
Returns:
255
NDArray: Decoded array
256
"""
257
258
def xor_check(data):
259
"""
260
Check if data is XOR encoded.
261
262
Parameters:
263
- data: bytes | bytearray | mmap.mmap - Data to check
264
265
Returns:
266
None: Always returns None (XOR is a transform, not a format)
267
"""
268
```
269
270
### Bit Order Reversal
271
272
Reverse the bit order within bytes for compatibility with different endianness or protocols.
273
274
```python { .api }
275
def bitorder_encode(data, *, out=None):
276
"""
277
Return data with reversed bit-order.
278
279
Parameters:
280
- data: bytes | bytearray | mmap.mmap | NDArray - Input data
281
- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer
282
283
Returns:
284
bytes | bytearray | NDArray: Data with bits reversed in each byte
285
"""
286
287
def bitorder_decode(data, *, out=None):
288
"""
289
Return data with restored bit-order (same as encode).
290
291
Parameters:
292
- data: bytes | bytearray | mmap.mmap | NDArray - Bit-reversed data
293
- out: bytes | bytearray | NDArray | None - Pre-allocated output buffer
294
295
Returns:
296
bytes | bytearray | NDArray: Data with original bit order
297
"""
298
299
def bitorder_check(data):
300
"""
301
Check if data has reversed bit-order.
302
303
Parameters:
304
- data: bytes | bytearray | mmap.mmap - Data to check
305
306
Returns:
307
None: Always returns None (bit order reversal is a transform)
308
"""
309
```
310
311
### Quantization
312
313
Reduce the precision of floating-point data by quantizing to fewer levels.
314
315
```python { .api }
316
def quantize_encode(data, *, levels=None, out=None):
317
"""
318
Return quantized data.
319
320
Parameters:
321
- data: NDArray - Floating-point data to quantize
322
- levels: int | None - Number of quantization levels (default 256)
323
- out: NDArray | None - Pre-allocated output buffer
324
325
Returns:
326
NDArray: Quantized data (typically integer type)
327
"""
328
329
def quantize_decode(data, *, levels=None, out=None):
330
"""
331
Return dequantized data.
332
333
Parameters:
334
- data: NDArray - Quantized data
335
- levels: int | None - Number of quantization levels used (default 256)
336
- out: NDArray | None - Pre-allocated output buffer
337
338
Returns:
339
NDArray: Dequantized floating-point data
340
"""
341
342
def quantize_check(data):
343
"""
344
Check if data is quantized.
345
346
Parameters:
347
- data: bytes | bytearray | mmap.mmap - Data to check
348
349
Returns:
350
None: Always returns None (quantization is a transform)
351
"""
352
```
353
354
## Usage Examples
355
356
### Image Data Preprocessing
357
358
```python
359
import imagecodecs
360
import numpy as np
361
362
# Simulate 16-bit sensor data
363
sensor_data = np.random.randint(0, 65536, (1024, 1024), dtype=np.uint16)
364
365
# Apply delta encoding to remove gradients
366
delta_encoded = imagecodecs.delta_encode(sensor_data, axis=1) # Row-wise differences
367
368
# Apply bit shuffling optimized for 16-bit data
369
bit_shuffled = imagecodecs.bitshuffle_encode(
370
delta_encoded,
371
itemsize=2, # 16-bit = 2 bytes
372
blocksize=8192 # 8KB blocks
373
)
374
375
# Compress the preprocessed data
376
compressed = imagecodecs.zlib_encode(bit_shuffled.tobytes(), level=9)
377
378
# Compare with direct compression
379
direct_compressed = imagecodecs.zlib_encode(sensor_data.tobytes(), level=9)
380
381
print(f"Original size: {sensor_data.nbytes} bytes")
382
print(f"Direct compression: {len(direct_compressed)} bytes ({len(direct_compressed)/sensor_data.nbytes:.2%})")
383
print(f"Preprocessed compression: {len(compressed)} bytes ({len(compressed)/sensor_data.nbytes:.2%})")
384
print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")
385
386
# Decompress and decode
387
decompressed_bytes = imagecodecs.zlib_decode(compressed)
388
decompressed_array = np.frombuffer(decompressed_bytes, dtype=np.uint16).reshape(sensor_data.shape)
389
bit_unshuffled = imagecodecs.bitshuffle_decode(decompressed_array, itemsize=2, blocksize=8192)
390
reconstructed = imagecodecs.delta_decode(bit_unshuffled, axis=1)
391
392
assert np.array_equal(sensor_data, reconstructed)
393
```
394
395
### Scientific Data Optimization
396
397
```python
398
import imagecodecs
399
import numpy as np
400
401
# Simulate time-series scientific measurements
402
time_points, sensors = 10000, 128
403
measurements = np.cumsum(np.random.normal(0, 0.1, (time_points, sensors)), axis=0).astype(np.float32)
404
405
# Apply floating-point predictor along time axis
406
predicted = imagecodecs.floatpred_encode(measurements, axis=0)
407
408
# Apply byte shuffling for better compression
409
shuffled = imagecodecs.byteshuffle_encode(predicted, axis=1, delta=False)
410
411
# Compress with high-performance algorithm
412
compressed = imagecodecs.blosc_encode(
413
shuffled.tobytes(),
414
level=5,
415
compressor='zstd',
416
shuffle=1, # Additional byte shuffle at BLOSC level
417
typesize=4, # float32 = 4 bytes
418
numthreads=4
419
)
420
421
print(f"Original: {measurements.nbytes} bytes")
422
print(f"Compressed: {len(compressed)} bytes ({len(compressed)/measurements.nbytes:.2%})")
423
424
# Decompress and reconstruct
425
decompressed_bytes = imagecodecs.blosc_decode(compressed, numthreads=4)
426
decompressed_array = np.frombuffer(decompressed_bytes, dtype=np.float32).reshape(measurements.shape)
427
unshuffled = imagecodecs.byteshuffle_decode(decompressed_array, axis=1, delta=False)
428
reconstructed = imagecodecs.floatpred_decode(unshuffled, axis=0)
429
430
# Verify exact reconstruction
431
assert np.allclose(measurements, reconstructed, rtol=1e-7, atol=1e-7)
432
```
433
434
### Integer Data Optimization
435
436
```python
437
import imagecodecs
438
import numpy as np
439
440
# Simulate sparse integer data (many small values)
441
data = np.random.choice([0, 1, 2, 3, 4, 255, 65535], size=(1000, 1000),
442
p=[0.4, 0.2, 0.15, 0.1, 0.1, 0.04, 0.01]).astype(np.uint16)
443
444
# Pack integers to remove unused high bits
445
packed = imagecodecs.packints_encode(data)
446
print(f"Original dtype: {data.dtype}, packed dtype: {packed.dtype}")
447
448
# Apply XOR encoding to remove correlation
449
xor_encoded = imagecodecs.xor_encode(packed)
450
451
# Apply run-length encoding for sparse data
452
packbits_compressed = imagecodecs.packbits_encode(xor_encoded.tobytes())
453
454
print(f"Original: {data.nbytes} bytes")
455
print(f"After packing: {packed.nbytes} bytes")
456
print(f"After PackBits: {len(packbits_compressed)} bytes")
457
print(f"Total compression: {data.nbytes / len(packbits_compressed):.1f}x")
458
459
# Reconstruct
460
packbits_decompressed = imagecodecs.packbits_decode(packbits_compressed)
461
packed_array = np.frombuffer(packbits_decompressed, dtype=packed.dtype).reshape(packed.shape)
462
xor_decoded = imagecodecs.xor_decode(packed_array)
463
unpacked = imagecodecs.packints_decode(xor_decoded, dtype=data.dtype)
464
465
assert np.array_equal(data, unpacked)
466
```
467
468
### Multi-dimensional Data Processing
469
470
```python
471
import imagecodecs
472
import numpy as np
473
474
# 3D medical or scientific dataset
475
depth, height, width = 64, 512, 512
476
volume = np.random.randint(0, 4096, (depth, height, width), dtype=np.uint16)
477
478
# Apply delta encoding along different axes
479
z_delta = imagecodecs.delta_encode(volume, axis=0) # Slice-to-slice differences
480
xy_delta = imagecodecs.delta_encode(z_delta, axis=2) # Column differences
481
482
# Byte shuffle optimized for 3D data
483
shuffled = imagecodecs.byteshuffle_encode(xy_delta, axis=1, reorder=True)
484
485
# Compress with algorithm suitable for 3D data
486
compressed = imagecodecs.lzma_encode(shuffled.tobytes(), level=6)
487
488
print(f"3D volume: {volume.shape}")
489
print(f"Original: {volume.nbytes} bytes")
490
print(f"Compressed: {len(compressed)} bytes ({len(compressed)/volume.nbytes:.2%})")
491
492
# Reconstruct
493
decompressed_bytes = imagecodecs.lzma_decode(compressed)
494
decompressed_array = np.frombuffer(decompressed_bytes, dtype=volume.dtype).reshape(volume.shape)
495
unshuffled = imagecodecs.byteshuffle_decode(decompressed_array, axis=1, reorder=True)
496
xy_reconstructed = imagecodecs.delta_decode(unshuffled, axis=2)
497
z_reconstructed = imagecodecs.delta_decode(xy_reconstructed, axis=0)
498
499
assert np.array_equal(volume, z_reconstructed)
500
```
501
502
### Quantization for Lossy Compression
503
504
```python
505
import imagecodecs
506
import numpy as np
507
508
# High-precision floating-point data
509
data = np.random.normal(0, 1, (256, 256)).astype(np.float64)
510
511
# Quantize to reduce precision
512
quantized = imagecodecs.quantize_encode(data, levels=1024) # 10-bit quantization
513
print(f"Original dtype: {data.dtype}, quantized dtype: {quantized.dtype}")
514
515
# Compress quantized data (integers compress better)
516
compressed = imagecodecs.zlib_encode(quantized.tobytes(), level=9)
517
518
# Compare with direct float compression
519
direct_compressed = imagecodecs.zlib_encode(data.tobytes(), level=9)
520
521
print(f"Original: {data.nbytes} bytes")
522
print(f"Direct compression: {len(direct_compressed)} bytes")
523
print(f"Quantized compression: {len(compressed)} bytes")
524
print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")
525
526
# Reconstruct (lossy)
527
decompressed_bytes = imagecodecs.zlib_decode(compressed)
528
quantized_restored = np.frombuffer(decompressed_bytes, dtype=quantized.dtype).reshape(data.shape)
529
dequantized = imagecodecs.quantize_decode(quantized_restored, levels=1024)
530
531
# Measure quantization error
532
max_error = np.max(np.abs(data - dequantized))
533
mse = np.mean((data - dequantized) ** 2)
534
print(f"Max quantization error: {max_error:.6f}")
535
print(f"MSE: {mse:.6f}")
536
```
537
538
## Performance Considerations
539
540
### Transform Selection
541
- **Delta encoding**: Best for data with trends or gradients
542
- **Bit shuffling**: Optimal for typed numerical data before compression
543
- **Byte shuffling**: Good for multi-byte data types and multi-dimensional arrays
544
- **PackBits**: Effective for sparse data with runs of identical values
545
- **XOR encoding**: Removes correlation between adjacent integer values
546
- **Quantization**: Trade precision for compression ratio
547
548
### Optimization Guidelines
549
- Chain transforms for maximum benefit (e.g., delta → shuffle → compress)
550
- Match itemsize parameter to your data type for bit/byte shuffling
551
- Use appropriate axis for delta encoding based on data structure
552
- Consider data distribution when choosing quantization levels
553
- Pre-allocate output buffers for large datasets
554
555
### Memory Management
556
- Transforms are typically in-place where possible
557
- Use appropriate block sizes for bit shuffling with large datasets
558
- Consider memory usage when chaining multiple transforms
559
560
## Constants and Configuration
561
562
### Bit Shuffle Constants
563
564
```python { .api }
565
class BITSHUFFLE:
566
available: bool
567
568
# Common item sizes
569
ITEMSIZE_UINT8 = 1
570
ITEMSIZE_UINT16 = 2
571
ITEMSIZE_UINT32 = 4
572
ITEMSIZE_UINT64 = 8
573
ITEMSIZE_FLOAT32 = 4
574
ITEMSIZE_FLOAT64 = 8
575
```
576
577
### Delta Encoding Constants
578
579
```python { .api }
580
class DELTA:
581
available: bool = True # Pure Python implementation always available
582
583
# Common distance values
584
DISTANCE_ADJACENT = 1 # Adjacent elements
585
DISTANCE_ROW = None # Width of 2D array (context-dependent)
586
DISTANCE_PLANE = None # Area of 2D slice in 3D array
587
```
588
589
## Error Handling
590
591
All array processing functions use the base `ImcdError` exception class:
592
593
```python { .api }
594
class ImcdError(Exception):
595
"""Base IMCD codec exception."""
596
597
# Specific aliases for array processing
598
DeltaError = ImcdError
599
BitshuffleError = Exception # Uses standard bitshuffle exceptions
600
ByteshuffleError = ImcdError
601
PackintsError = ImcdError
602
PackbitsError = ImcdError
603
XorError = ImcdError
604
BitorderError = ImcdError
605
QuantizeError = ImcdError
606
```