0
# Scientific Data Compression
1
2
Specialized codecs optimized for scientific computing, including floating-point data compression, error-bounded compression, and array processing utilities. These algorithms are designed for numerical accuracy, performance, and specific scientific data characteristics.
3
4
## Capabilities
5
6
### ZFP Floating-Point Compression
7
8
Compressed floating-point arrays with configurable precision, rate, or error tolerance for scientific datasets.
9
10
```python { .api }
11
def zfp_encode(data, *, rate=None, precision=None, tolerance=None, out=None):
12
"""
13
Return ZFP encoded floating-point array.
14
15
Parameters:
16
- data: NDArray - Floating-point array to compress (1D-4D, float32/float64)
17
- rate: float | None - Target compression rate in bits per value
18
- precision: int | None - Number of bit planes to encode (lossless if sufficient)
19
- tolerance: float | None - Absolute error tolerance (error-bounded mode)
20
- out: bytes | bytearray | None - Pre-allocated output buffer
21
22
Returns:
23
bytes | bytearray: ZFP compressed data
24
25
Note: Exactly one of rate, precision, or tolerance must be specified
26
"""
27
28
def zfp_decode(data, shape=None, dtype=None, *, out=None):
29
"""
30
Return decoded ZFP floating-point array.
31
32
Parameters:
33
- data: bytes | bytearray | mmap.mmap - ZFP compressed data
34
- shape: tuple | None - Output array shape (required)
35
- dtype: numpy.dtype | None - Output data type (float32 or float64, required)
36
- out: NDArray | None - Pre-allocated output buffer
37
38
Returns:
39
NDArray: Decoded floating-point array
40
"""
41
42
def zfp_check(data):
43
"""
44
Check if data is ZFP encoded.
45
46
Parameters:
47
- data: bytes | bytearray | mmap.mmap - Data to check
48
49
Returns:
50
bool | None: True if ZFP header detected
51
"""
52
```
53
54
### SPERR Scientific Compression
55
56
Error-bounded lossy compressor optimized for scientific floating-point data with multiple quality modes.
57
58
```python { .api }
59
def sperr_encode(data, *, mode=None, quality=None, tolerance=None, out=None):
60
"""
61
Return SPERR encoded floating-point data.
62
63
Parameters:
64
- data: NDArray - Floating-point data to compress (2D/3D, float32/float64)
65
- mode: str | None - Compression mode:
66
'rate' = fixed bit rate, 'psnr' = peak signal-to-noise ratio, 'pwe' = point-wise error
67
- quality: float | None - Quality parameter for chosen mode:
68
For 'rate': bits per pixel (e.g., 1.0-16.0)
69
For 'psnr': target PSNR in dB (e.g., 40.0-80.0)
70
For 'pwe': maximum point-wise error
71
- tolerance: float | None - Alternative way to specify error tolerance
72
- out: bytes | bytearray | None - Pre-allocated output buffer
73
74
Returns:
75
bytes | bytearray: SPERR compressed data
76
"""
77
78
def sperr_decode(data, *, out=None):
79
"""
80
Return decoded SPERR floating-point data.
81
82
Parameters:
83
- data: bytes | bytearray | mmap.mmap - SPERR compressed data
84
- out: NDArray | None - Pre-allocated output buffer
85
86
Returns:
87
NDArray: Decoded floating-point array
88
"""
89
90
def sperr_check(data):
91
"""
92
Check if data is SPERR encoded.
93
94
Parameters:
95
- data: bytes | bytearray | mmap.mmap - Data to check
96
97
Returns:
98
bool | None: True if SPERR signature detected
99
"""
100
```
101
102
### SZ3 Error-Bounded Compression
103
104
High-performance error-bounded lossy compressor for scientific datasets with excellent compression ratios.
105
106
```python { .api }
107
def sz3_encode(data, *, tolerance=None, out=None):
108
"""
109
Return SZ3 encoded floating-point data.
110
111
Parameters:
112
- data: NDArray - Floating-point data to compress (float32/float64)
113
- tolerance: float | None - Absolute error bound (required)
114
- out: bytes | bytearray | None - Pre-allocated output buffer
115
116
Returns:
117
bytes | bytearray: SZ3 compressed data
118
"""
119
120
def sz3_decode(data, shape=None, dtype=None, *, out=None):
121
"""
122
Return decoded SZ3 floating-point data.
123
124
Parameters:
125
- data: bytes | bytearray | mmap.mmap - SZ3 compressed data
126
- shape: tuple | None - Output array shape (required)
127
- dtype: numpy.dtype | None - Output data type (required)
128
- out: NDArray | None - Pre-allocated output buffer
129
130
Returns:
131
NDArray: Decoded floating-point array
132
"""
133
134
def sz3_check(data):
135
"""
136
Check if data is SZ3 encoded.
137
138
Parameters:
139
- data: bytes | bytearray | mmap.mmap - Data to check
140
141
Returns:
142
bool | None: True if SZ3 signature detected
143
"""
144
```
145
146
### Floating-Point Predictor
147
148
Preprocessing filter that improves compression by removing predictable patterns in floating-point data.
149
150
```python { .api }
151
def floatpred_encode(data, *, axis=-1, dist=1, out=None):
152
"""
153
Return floating-point predictor encoded data.
154
155
Parameters:
156
- data: NDArray - Floating-point data to encode (float32/float64)
157
- axis: int - Axis along which to apply predictor (default -1)
158
- dist: int - Predictor distance (default 1)
159
- out: NDArray | None - Pre-allocated output buffer
160
161
Returns:
162
NDArray: Predictor encoded data (same shape and dtype as input)
163
"""
164
165
def floatpred_decode(data, *, axis=-1, dist=1, out=None):
166
"""
167
Return floating-point predictor decoded data.
168
169
Parameters:
170
- data: NDArray - Predictor encoded data
171
- axis: int - Axis along which predictor was applied (default -1)
172
- dist: int - Predictor distance used (default 1)
173
- out: NDArray | None - Pre-allocated output buffer
174
175
Returns:
176
NDArray: Decoded floating-point data
177
"""
178
179
def floatpred_check(data):
180
"""
181
Check if data is floating-point predictor encoded.
182
183
Parameters:
184
- data: bytes | bytearray | mmap.mmap | NDArray - Data to check
185
186
Returns:
187
None: Always returns None (predictor is a transform, not a format)
188
"""
189
```
190
191
### JETRAW Scientific Image Compression
192
193
High-performance lossless compression specifically optimized for scientific image data including X-ray, microscopy, and other detector data.
194
195
```python { .api }
196
def jetraw_encode(data, *, identifier=None, out=None):
197
"""
198
Return JETRAW encoded image data.
199
200
Parameters:
201
- data: NDArray - Image data to compress (typically uint16 detector data)
202
- identifier: str | None - Optional identifier string
203
- out: bytes | bytearray | None - Pre-allocated output buffer
204
205
Returns:
206
bytes | bytearray: JETRAW compressed data
207
"""
208
209
def jetraw_decode(data, *, out=None):
210
"""
211
Return decoded JETRAW image data.
212
213
Parameters:
214
- data: bytes | bytearray | mmap.mmap - JETRAW compressed data
215
- out: NDArray | None - Pre-allocated output buffer
216
217
Returns:
218
NDArray: Decoded image data
219
"""
220
221
def jetraw_check(data):
222
"""
223
Check if data is JETRAW encoded.
224
225
Parameters:
226
- data: bytes | bytearray | mmap.mmap - Data to check
227
228
Returns:
229
bool | None: True if JETRAW signature detected
230
"""
231
```
232
233
### LERC Limited Error Raster Compression
234
235
Lossy/lossless compression specifically designed for raster data with configurable error bounds.
236
237
```python { .api }
238
def lerc_encode(data, *, tolerance=None, version=None, out=None):
239
"""
240
Return LERC encoded raster data.
241
242
Parameters:
243
- data: NDArray - Raster data to compress (integer or floating-point)
244
- tolerance: float | None - Maximum error tolerance (0.0 for lossless)
245
- version: int | None - LERC version (2 or 4, default 4)
246
- out: bytes | bytearray | None - Pre-allocated output buffer
247
248
Returns:
249
bytes | bytearray: LERC compressed data
250
"""
251
252
def lerc_decode(data, *, out=None):
253
"""
254
Return decoded LERC raster data.
255
256
Parameters:
257
- data: bytes | bytearray | mmap.mmap - LERC compressed data
258
- out: NDArray | None - Pre-allocated output buffer
259
260
Returns:
261
NDArray: Decoded raster array
262
"""
263
264
def lerc_check(data):
265
"""
266
Check if data is LERC encoded.
267
268
Parameters:
269
- data: bytes | bytearray | mmap.mmap - Data to check
270
271
Returns:
272
bool | None: True if LERC signature detected
273
"""
274
```
275
276
### SZIP Scientific Data Compression
277
278
NASA's adaptive entropy encoder designed for scientific datasets, particularly satellite and remote sensing data.
279
280
```python { .api }
281
def szip_encode(data, *, coding=None, pixels_per_block=None, bits_per_pixel=None, out=None):
282
"""
283
Return SZIP encoded scientific data.
284
285
Parameters:
286
- data: NDArray - Scientific data to compress (integer types)
287
- coding: str | None - Coding method ('ec' for entropy coding, 'nn' for nearest neighbor)
288
- pixels_per_block: int | None - Pixels per compression block (8, 16, 32)
289
- bits_per_pixel: int | None - Bits per pixel in input data
290
- out: bytes | bytearray | None - Pre-allocated output buffer
291
292
Returns:
293
bytes | bytearray: SZIP compressed data
294
"""
295
296
def szip_decode(data, *, out=None):
297
"""
298
Return decoded SZIP scientific data.
299
300
Parameters:
301
- data: bytes | bytearray | mmap.mmap - SZIP compressed data
302
- out: NDArray | None - Pre-allocated output buffer
303
304
Returns:
305
NDArray: Decoded scientific data array
306
"""
307
308
def szip_check(data):
309
"""
310
Check if data is SZIP encoded.
311
312
Parameters:
313
- data: bytes | bytearray | mmap.mmap - Data to check
314
315
Returns:
316
bool | None: True if SZIP signature detected
317
"""
318
```
319
320
### PCODEC Parquet Codec
321
322
Compression codec designed for columnar data formats, optimized for analytical workloads.
323
324
```python { .api }
325
def pcodec_encode(data, *, level=None, out=None):
326
"""
327
Return PCODEC encoded columnar data.
328
329
Parameters:
330
- data: NDArray - Columnar data to compress
331
- level: int | None - Compression level (0-12, default 8)
332
- out: bytes | bytearray | None - Pre-allocated output buffer
333
334
Returns:
335
bytes | bytearray: PCODEC compressed data
336
"""
337
338
def pcodec_decode(data, *, out=None):
339
"""
340
Return decoded PCODEC columnar data.
341
342
Parameters:
343
- data: bytes | bytearray | mmap.mmap - PCODEC compressed data
344
- out: NDArray | None - Pre-allocated output buffer
345
346
Returns:
347
NDArray: Decoded columnar data array
348
"""
349
350
def pcodec_check(data):
351
"""
352
Check if data is PCODEC encoded.
353
354
Parameters:
355
- data: bytes | bytearray | mmap.mmap - Data to check
356
357
Returns:
358
bool | None: True if PCODEC signature detected
359
"""
360
```
361
362
## Usage Examples
363
364
### Climate Data Compression
365
366
```python
367
import imagecodecs
368
import numpy as np
369
370
# Simulate climate model output (temperature data)
371
time_steps, lat, lon = 365, 180, 360
372
temperature = np.random.normal(15.0, 20.0, (time_steps, lat, lon)).astype(np.float32)
373
374
# Error-bounded compression with 0.1°C tolerance
375
zfp_compressed = imagecodecs.zfp_encode(temperature, tolerance=0.1)
376
zfp_decoded = imagecodecs.zfp_decode(
377
zfp_compressed,
378
shape=temperature.shape,
379
dtype=temperature.dtype
380
)
381
382
# Verify error bound
383
max_error = np.max(np.abs(temperature - zfp_decoded))
384
print(f"Max error: {max_error:.3f}°C (tolerance: 0.1°C)")
385
print(f"Compression ratio: {temperature.nbytes / len(zfp_compressed):.1f}x")
386
387
# Alternative with SPERR
388
sperr_compressed = imagecodecs.sperr_encode(
389
temperature,
390
mode='pwe',
391
quality=0.1 # 0.1°C point-wise error
392
)
393
sperr_decoded = imagecodecs.sperr_decode(sperr_compressed)
394
```
395
396
### Medical Imaging Data
397
398
```python
399
import imagecodecs
400
import numpy as np
401
402
# Simulate 3D medical scan (CT or MRI)
403
scan = np.random.randint(0, 4096, (256, 256, 128), dtype=np.uint16)
404
405
# Lossless compression with LERC
406
lerc_lossless = imagecodecs.lerc_encode(scan, tolerance=0.0)
407
lerc_decoded = imagecodecs.lerc_decode(lerc_lossless)
408
assert np.array_equal(scan, lerc_decoded)
409
410
# Near-lossless with small tolerance
411
lerc_lossy = imagecodecs.lerc_encode(scan, tolerance=1.0) # 1 HU tolerance
412
lerc_lossy_decoded = imagecodecs.lerc_decode(lerc_lossy)
413
414
print(f"Original size: {scan.nbytes} bytes")
415
print(f"Lossless LERC: {len(lerc_lossless)} bytes ({len(lerc_lossless)/scan.nbytes:.2%})")
416
print(f"Lossy LERC: {len(lerc_lossy)} bytes ({len(lerc_lossy)/scan.nbytes:.2%})")
417
```
418
419
### Satellite Data Processing
420
421
```python
422
import imagecodecs
423
import numpy as np
424
425
# Simulate satellite imagery (multispectral)
426
bands, height, width = 8, 1024, 1024
427
satellite_data = np.random.randint(0, 65535, (bands, height, width), dtype=np.uint16)
428
429
# SZIP compression optimized for remote sensing
430
compressed_bands = []
431
for band in satellite_data:
432
compressed = imagecodecs.szip_encode(
433
band,
434
coding='ec', # Entropy coding
435
pixels_per_block=16,
436
bits_per_pixel=16
437
)
438
compressed_bands.append(compressed)
439
440
# Calculate total compression
441
original_size = satellite_data.nbytes
442
compressed_size = sum(len(band) for band in compressed_bands)
443
print(f"SZIP compression ratio: {original_size / compressed_size:.1f}x")
444
445
# Decode bands
446
decoded_bands = []
447
for compressed in compressed_bands:
448
decoded = imagecodecs.szip_decode(compressed)
449
decoded_bands.append(decoded)
450
451
reconstructed = np.stack(decoded_bands)
452
assert np.array_equal(satellite_data, reconstructed)
453
```
454
455
### Floating-Point Predictor Usage
456
457
```python
458
import imagecodecs
459
import numpy as np
460
461
# Scientific simulation data with smooth gradients
462
x = np.linspace(0, 10, 1000)
463
y = np.linspace(0, 10, 1000)
464
X, Y = np.meshgrid(x, y)
465
field = np.sin(X) * np.cos(Y) + 0.1 * np.random.random((1000, 1000))
466
field = field.astype(np.float32)
467
468
# Apply floating-point predictor before compression
469
predicted = imagecodecs.floatpred_encode(field, axis=1) # Predict along rows
470
471
# Compress the predicted data
472
compressed = imagecodecs.zlib_encode(predicted.tobytes(), level=9)
473
474
# Compare with direct compression
475
direct_compressed = imagecodecs.zlib_encode(field.tobytes(), level=9)
476
477
print(f"Direct compression: {len(direct_compressed)} bytes")
478
print(f"With predictor: {len(compressed)} bytes")
479
print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")
480
481
# Decompress and decode
482
decompressed_bytes = imagecodecs.zlib_decode(compressed)
483
predicted_restored = np.frombuffer(decompressed_bytes, dtype=np.float32).reshape(field.shape)
484
field_restored = imagecodecs.floatpred_decode(predicted_restored, axis=1)
485
486
# Verify exact reconstruction (lossless)
487
assert np.array_equal(field, field_restored)
488
```
489
490
### Quality vs Compression Trade-offs
491
492
```python
493
import imagecodecs
494
import numpy as np
495
496
# Generate test scientific dataset
497
data = np.random.exponential(2.0, (512, 512, 64)).astype(np.float32)
498
499
# Test different error tolerances with ZFP
500
tolerances = [0.001, 0.01, 0.1, 1.0]
501
for tol in tolerances:
502
compressed = imagecodecs.zfp_encode(data, tolerance=tol)
503
decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)
504
505
compression_ratio = data.nbytes / len(compressed)
506
max_error = np.max(np.abs(data - decoded))
507
mse = np.mean((data - decoded) ** 2)
508
509
print(f"Tolerance {tol:5.3f}: {compression_ratio:5.1f}x compression, "
510
f"max error {max_error:.3f}, MSE {mse:.6f}")
511
512
# Test different bit rates with ZFP
513
rates = [1.0, 2.0, 4.0, 8.0]
514
for rate in rates:
515
compressed = imagecodecs.zfp_encode(data, rate=rate)
516
decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)
517
518
actual_rate = len(compressed) * 8 / data.size
519
max_error = np.max(np.abs(data - decoded))
520
521
print(f"Target rate {rate:3.1f} bpv: actual {actual_rate:.1f} bpv, "
522
f"max error {max_error:.3f}")
523
```
524
525
## Performance Considerations
526
527
### Algorithm Selection
528
- **ZFP**: Best for regular grids, configurable precision/rate/tolerance
529
- **SPERR**: Optimized for 2D/3D scientific datasets, excellent compression ratios
530
- **SZ3**: High performance, good for large datasets
531
- **LERC**: Designed for raster/GIS data, wide format support
532
- **SZIP**: NASA standard, excellent for satellite/remote sensing data
533
534
### Optimization Guidelines
535
- Use floating-point predictor before general compression for smooth data
536
- Choose error tolerance based on measurement precision
537
- Consider data characteristics (smooth vs noisy, regular vs irregular)
538
- Balance compression ratio vs reconstruction speed for your use case
539
540
### Memory Management
541
- Pre-allocate output buffers for large datasets
542
- Process data in chunks for memory-constrained environments
543
- Use appropriate data types (float32 vs float64) based on precision needs
544
545
## Constants and Configuration
546
547
### ZFP Constants
548
549
```python { .api }
550
class ZFP:
551
available: bool
552
553
class EXEC:
554
SERIAL = 0
555
OMP = 1 # OpenMP parallel execution
556
CUDA = 2 # CUDA GPU execution
557
558
class MODE:
559
EXPERT = 0 # Expert mode with custom parameters
560
FIXED_RATE = 1 # Fixed bit rate mode
561
FIXED_PRECISION = 2 # Fixed precision mode
562
FIXED_ACCURACY = 3 # Fixed accuracy/tolerance mode
563
```
564
565
### SPERR Constants
566
567
```python { .api }
568
class SPERR:
569
available: bool
570
571
class MODE:
572
RATE = 'rate' # Fixed bit rate
573
PSNR = 'psnr' # Peak signal-to-noise ratio
574
PWE = 'pwe' # Point-wise error bound
575
```
576
577
## Error Handling
578
579
```python { .api }
580
class ZfpError(Exception):
581
"""ZFP codec exception."""
582
583
class SperrError(Exception):
584
"""SPERR codec exception."""
585
586
class Sz3Error(Exception):
587
"""SZ3 codec exception."""
588
589
class FloatpredError(Exception):
590
"""Floating-point predictor exception."""
591
592
class LercError(Exception):
593
"""LERC codec exception."""
594
595
class SzipError(Exception):
596
"""SZIP codec exception."""
597
598
class PcodecError(Exception):
599
"""PCODEC codec exception."""
600
```