or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-processing.mdcolor-management.mdimage-formats.mdimage-io.mdindex.mdlossless-compression.mdscientific-compression.mdutilities.md

scientific-compression.mddocs/

0

# Scientific Data Compression

1

2

Specialized codecs optimized for scientific computing, including floating-point data compression, error-bounded compression, and array processing utilities. These algorithms are designed for numerical accuracy, performance, and specific scientific data characteristics.

3

4

## Capabilities

5

6

### ZFP Floating-Point Compression

7

8

Compressed floating-point arrays with configurable precision, rate, or error tolerance for scientific datasets.

9

10

```python { .api }

11

def zfp_encode(data, *, rate=None, precision=None, tolerance=None, out=None):

12

"""

13

Return ZFP encoded floating-point array.

14

15

Parameters:

16

- data: NDArray - Floating-point array to compress (1D-4D, float32/float64)

17

- rate: float | None - Target compression rate in bits per value

18

- precision: int | None - Number of bit planes to encode (lossless if sufficient)

19

- tolerance: float | None - Absolute error tolerance (error-bounded mode)

20

- out: bytes | bytearray | None - Pre-allocated output buffer

21

22

Returns:

23

bytes | bytearray: ZFP compressed data

24

25

Note: Exactly one of rate, precision, or tolerance must be specified

26

"""

27

28

def zfp_decode(data, shape=None, dtype=None, *, out=None):

29

"""

30

Return decoded ZFP floating-point array.

31

32

Parameters:

33

- data: bytes | bytearray | mmap.mmap - ZFP compressed data

34

- shape: tuple | None - Output array shape (required)

35

- dtype: numpy.dtype | None - Output data type (float32 or float64, required)

36

- out: NDArray | None - Pre-allocated output buffer

37

38

Returns:

39

NDArray: Decoded floating-point array

40

"""

41

42

def zfp_check(data):

43

"""

44

Check if data is ZFP encoded.

45

46

Parameters:

47

- data: bytes | bytearray | mmap.mmap - Data to check

48

49

Returns:

50

bool | None: True if ZFP header detected

51

"""

52

```

53

54

### SPERR Scientific Compression

55

56

Error-bounded lossy compressor optimized for scientific floating-point data with multiple quality modes.

57

58

```python { .api }

59

def sperr_encode(data, *, mode=None, quality=None, tolerance=None, out=None):

60

"""

61

Return SPERR encoded floating-point data.

62

63

Parameters:

64

- data: NDArray - Floating-point data to compress (2D/3D, float32/float64)

65

- mode: str | None - Compression mode:

66

'rate' = fixed bit rate, 'psnr' = peak signal-to-noise ratio, 'pwe' = point-wise error

67

- quality: float | None - Quality parameter for chosen mode:

68

For 'rate': bits per pixel (e.g., 1.0-16.0)

69

For 'psnr': target PSNR in dB (e.g., 40.0-80.0)

70

For 'pwe': maximum point-wise error

71

- tolerance: float | None - Alternative way to specify error tolerance

72

- out: bytes | bytearray | None - Pre-allocated output buffer

73

74

Returns:

75

bytes | bytearray: SPERR compressed data

76

"""

77

78

def sperr_decode(data, *, out=None):

79

"""

80

Return decoded SPERR floating-point data.

81

82

Parameters:

83

- data: bytes | bytearray | mmap.mmap - SPERR compressed data

84

- out: NDArray | None - Pre-allocated output buffer

85

86

Returns:

87

NDArray: Decoded floating-point array

88

"""

89

90

def sperr_check(data):

91

"""

92

Check if data is SPERR encoded.

93

94

Parameters:

95

- data: bytes | bytearray | mmap.mmap - Data to check

96

97

Returns:

98

bool | None: True if SPERR signature detected

99

"""

100

```

101

102

### SZ3 Error-Bounded Compression

103

104

High-performance error-bounded lossy compressor for scientific datasets with excellent compression ratios.

105

106

```python { .api }

107

def sz3_encode(data, *, tolerance=None, out=None):

108

"""

109

Return SZ3 encoded floating-point data.

110

111

Parameters:

112

- data: NDArray - Floating-point data to compress (float32/float64)

113

- tolerance: float | None - Absolute error bound (required)

114

- out: bytes | bytearray | None - Pre-allocated output buffer

115

116

Returns:

117

bytes | bytearray: SZ3 compressed data

118

"""

119

120

def sz3_decode(data, shape=None, dtype=None, *, out=None):

121

"""

122

Return decoded SZ3 floating-point data.

123

124

Parameters:

125

- data: bytes | bytearray | mmap.mmap - SZ3 compressed data

126

- shape: tuple | None - Output array shape (required)

127

- dtype: numpy.dtype | None - Output data type (required)

128

- out: NDArray | None - Pre-allocated output buffer

129

130

Returns:

131

NDArray: Decoded floating-point array

132

"""

133

134

def sz3_check(data):

135

"""

136

Check if data is SZ3 encoded.

137

138

Parameters:

139

- data: bytes | bytearray | mmap.mmap - Data to check

140

141

Returns:

142

bool | None: True if SZ3 signature detected

143

"""

144

```

145

146

### Floating-Point Predictor

147

148

Preprocessing filter that improves compression by removing predictable patterns in floating-point data.

149

150

```python { .api }

151

def floatpred_encode(data, *, axis=-1, dist=1, out=None):

152

"""

153

Return floating-point predictor encoded data.

154

155

Parameters:

156

- data: NDArray - Floating-point data to encode (float32/float64)

157

- axis: int - Axis along which to apply predictor (default -1)

158

- dist: int - Predictor distance (default 1)

159

- out: NDArray | None - Pre-allocated output buffer

160

161

Returns:

162

NDArray: Predictor encoded data (same shape and dtype as input)

163

"""

164

165

def floatpred_decode(data, *, axis=-1, dist=1, out=None):

166

"""

167

Return floating-point predictor decoded data.

168

169

Parameters:

170

- data: NDArray - Predictor encoded data

171

- axis: int - Axis along which predictor was applied (default -1)

172

- dist: int - Predictor distance used (default 1)

173

- out: NDArray | None - Pre-allocated output buffer

174

175

Returns:

176

NDArray: Decoded floating-point data

177

"""

178

179

def floatpred_check(data):

180

"""

181

Check if data is floating-point predictor encoded.

182

183

Parameters:

184

- data: bytes | bytearray | mmap.mmap | NDArray - Data to check

185

186

Returns:

187

None: Always returns None (predictor is a transform, not a format)

188

"""

189

```

190

191

### JETRAW Scientific Image Compression

192

193

High-performance lossless compression specifically optimized for scientific image data including X-ray, microscopy, and other detector data.

194

195

```python { .api }

196

def jetraw_encode(data, *, identifier=None, out=None):

197

"""

198

Return JETRAW encoded image data.

199

200

Parameters:

201

- data: NDArray - Image data to compress (typically uint16 detector data)

202

- identifier: str | None - Optional identifier string

203

- out: bytes | bytearray | None - Pre-allocated output buffer

204

205

Returns:

206

bytes | bytearray: JETRAW compressed data

207

"""

208

209

def jetraw_decode(data, *, out=None):

210

"""

211

Return decoded JETRAW image data.

212

213

Parameters:

214

- data: bytes | bytearray | mmap.mmap - JETRAW compressed data

215

- out: NDArray | None - Pre-allocated output buffer

216

217

Returns:

218

NDArray: Decoded image data

219

"""

220

221

def jetraw_check(data):

222

"""

223

Check if data is JETRAW encoded.

224

225

Parameters:

226

- data: bytes | bytearray | mmap.mmap - Data to check

227

228

Returns:

229

bool | None: True if JETRAW signature detected

230

"""

231

```

232

233

### LERC Limited Error Raster Compression

234

235

Lossy/lossless compression specifically designed for raster data with configurable error bounds.

236

237

```python { .api }

238

def lerc_encode(data, *, tolerance=None, version=None, out=None):

239

"""

240

Return LERC encoded raster data.

241

242

Parameters:

243

- data: NDArray - Raster data to compress (integer or floating-point)

244

- tolerance: float | None - Maximum error tolerance (0.0 for lossless)

245

- version: int | None - LERC version (2 or 4, default 4)

246

- out: bytes | bytearray | None - Pre-allocated output buffer

247

248

Returns:

249

bytes | bytearray: LERC compressed data

250

"""

251

252

def lerc_decode(data, *, out=None):

253

"""

254

Return decoded LERC raster data.

255

256

Parameters:

257

- data: bytes | bytearray | mmap.mmap - LERC compressed data

258

- out: NDArray | None - Pre-allocated output buffer

259

260

Returns:

261

NDArray: Decoded raster array

262

"""

263

264

def lerc_check(data):

265

"""

266

Check if data is LERC encoded.

267

268

Parameters:

269

- data: bytes | bytearray | mmap.mmap - Data to check

270

271

Returns:

272

bool | None: True if LERC signature detected

273

"""

274

```

275

276

### SZIP Scientific Data Compression

277

278

NASA's adaptive entropy encoder designed for scientific datasets, particularly satellite and remote sensing data.

279

280

```python { .api }

281

def szip_encode(data, *, coding=None, pixels_per_block=None, bits_per_pixel=None, out=None):

282

"""

283

Return SZIP encoded scientific data.

284

285

Parameters:

286

- data: NDArray - Scientific data to compress (integer types)

287

- coding: str | None - Coding method ('ec' for entropy coding, 'nn' for nearest neighbor)

288

- pixels_per_block: int | None - Pixels per compression block (8, 16, 32)

289

- bits_per_pixel: int | None - Bits per pixel in input data

290

- out: bytes | bytearray | None - Pre-allocated output buffer

291

292

Returns:

293

bytes | bytearray: SZIP compressed data

294

"""

295

296

def szip_decode(data, *, out=None):

297

"""

298

Return decoded SZIP scientific data.

299

300

Parameters:

301

- data: bytes | bytearray | mmap.mmap - SZIP compressed data

302

- out: NDArray | None - Pre-allocated output buffer

303

304

Returns:

305

NDArray: Decoded scientific data array

306

"""

307

308

def szip_check(data):

309

"""

310

Check if data is SZIP encoded.

311

312

Parameters:

313

- data: bytes | bytearray | mmap.mmap - Data to check

314

315

Returns:

316

bool | None: True if SZIP signature detected

317

"""

318

```

319

320

### PCODEC Parquet Codec

321

322

Compression codec designed for columnar data formats, optimized for analytical workloads.

323

324

```python { .api }

325

def pcodec_encode(data, *, level=None, out=None):

326

"""

327

Return PCODEC encoded columnar data.

328

329

Parameters:

330

- data: NDArray - Columnar data to compress

331

- level: int | None - Compression level (0-12, default 8)

332

- out: bytes | bytearray | None - Pre-allocated output buffer

333

334

Returns:

335

bytes | bytearray: PCODEC compressed data

336

"""

337

338

def pcodec_decode(data, *, out=None):

339

"""

340

Return decoded PCODEC columnar data.

341

342

Parameters:

343

- data: bytes | bytearray | mmap.mmap - PCODEC compressed data

344

- out: NDArray | None - Pre-allocated output buffer

345

346

Returns:

347

NDArray: Decoded columnar data array

348

"""

349

350

def pcodec_check(data):

351

"""

352

Check if data is PCODEC encoded.

353

354

Parameters:

355

- data: bytes | bytearray | mmap.mmap - Data to check

356

357

Returns:

358

bool | None: True if PCODEC signature detected

359

"""

360

```

361

362

## Usage Examples

363

364

### Climate Data Compression

365

366

```python

367

import imagecodecs

368

import numpy as np

369

370

# Simulate climate model output (temperature data)

371

time_steps, lat, lon = 365, 180, 360

372

temperature = np.random.normal(15.0, 20.0, (time_steps, lat, lon)).astype(np.float32)

373

374

# Error-bounded compression with 0.1°C tolerance

375

zfp_compressed = imagecodecs.zfp_encode(temperature, tolerance=0.1)

376

zfp_decoded = imagecodecs.zfp_decode(

377

zfp_compressed,

378

shape=temperature.shape,

379

dtype=temperature.dtype

380

)

381

382

# Verify error bound

383

max_error = np.max(np.abs(temperature - zfp_decoded))

384

print(f"Max error: {max_error:.3f}°C (tolerance: 0.1°C)")

385

print(f"Compression ratio: {temperature.nbytes / len(zfp_compressed):.1f}x")

386

387

# Alternative with SPERR

388

sperr_compressed = imagecodecs.sperr_encode(

389

temperature,

390

mode='pwe',

391

quality=0.1 # 0.1°C point-wise error

392

)

393

sperr_decoded = imagecodecs.sperr_decode(sperr_compressed)

394

```

395

396

### Medical Imaging Data

397

398

```python

399

import imagecodecs

400

import numpy as np

401

402

# Simulate 3D medical scan (CT or MRI)

403

scan = np.random.randint(0, 4096, (256, 256, 128), dtype=np.uint16)

404

405

# Lossless compression with LERC

406

lerc_lossless = imagecodecs.lerc_encode(scan, tolerance=0.0)

407

lerc_decoded = imagecodecs.lerc_decode(lerc_lossless)

408

assert np.array_equal(scan, lerc_decoded)

409

410

# Near-lossless with small tolerance

411

lerc_lossy = imagecodecs.lerc_encode(scan, tolerance=1.0) # 1 HU tolerance

412

lerc_lossy_decoded = imagecodecs.lerc_decode(lerc_lossy)

413

414

print(f"Original size: {scan.nbytes} bytes")

415

print(f"Lossless LERC: {len(lerc_lossless)} bytes ({len(lerc_lossless)/scan.nbytes:.2%})")

416

print(f"Lossy LERC: {len(lerc_lossy)} bytes ({len(lerc_lossy)/scan.nbytes:.2%})")

417

```

418

419

### Satellite Data Processing

420

421

```python

422

import imagecodecs

423

import numpy as np

424

425

# Simulate satellite imagery (multispectral)

426

bands, height, width = 8, 1024, 1024

427

satellite_data = np.random.randint(0, 65535, (bands, height, width), dtype=np.uint16)

428

429

# SZIP compression optimized for remote sensing

430

compressed_bands = []

431

for band in satellite_data:

432

compressed = imagecodecs.szip_encode(

433

band,

434

coding='ec', # Entropy coding

435

pixels_per_block=16,

436

bits_per_pixel=16

437

)

438

compressed_bands.append(compressed)

439

440

# Calculate total compression

441

original_size = satellite_data.nbytes

442

compressed_size = sum(len(band) for band in compressed_bands)

443

print(f"SZIP compression ratio: {original_size / compressed_size:.1f}x")

444

445

# Decode bands

446

decoded_bands = []

447

for compressed in compressed_bands:

448

decoded = imagecodecs.szip_decode(compressed)

449

decoded_bands.append(decoded)

450

451

reconstructed = np.stack(decoded_bands)

452

assert np.array_equal(satellite_data, reconstructed)

453

```

454

455

### Floating-Point Predictor Usage

456

457

```python

458

import imagecodecs

459

import numpy as np

460

461

# Scientific simulation data with smooth gradients

462

x = np.linspace(0, 10, 1000)

463

y = np.linspace(0, 10, 1000)

464

X, Y = np.meshgrid(x, y)

465

field = np.sin(X) * np.cos(Y) + 0.1 * np.random.random((1000, 1000))

466

field = field.astype(np.float32)

467

468

# Apply floating-point predictor before compression

469

predicted = imagecodecs.floatpred_encode(field, axis=1) # Predict along rows

470

471

# Compress the predicted data

472

compressed = imagecodecs.zlib_encode(predicted.tobytes(), level=9)

473

474

# Compare with direct compression

475

direct_compressed = imagecodecs.zlib_encode(field.tobytes(), level=9)

476

477

print(f"Direct compression: {len(direct_compressed)} bytes")

478

print(f"With predictor: {len(compressed)} bytes")

479

print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")

480

481

# Decompress and decode

482

decompressed_bytes = imagecodecs.zlib_decode(compressed)

483

predicted_restored = np.frombuffer(decompressed_bytes, dtype=np.float32).reshape(field.shape)

484

field_restored = imagecodecs.floatpred_decode(predicted_restored, axis=1)

485

486

# Verify exact reconstruction (lossless)

487

assert np.array_equal(field, field_restored)

488

```

489

490

### Quality vs Compression Trade-offs

491

492

```python

493

import imagecodecs

494

import numpy as np

495

496

# Generate test scientific dataset

497

data = np.random.exponential(2.0, (512, 512, 64)).astype(np.float32)

498

499

# Test different error tolerances with ZFP

500

tolerances = [0.001, 0.01, 0.1, 1.0]

501

for tol in tolerances:

502

compressed = imagecodecs.zfp_encode(data, tolerance=tol)

503

decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)

504

505

compression_ratio = data.nbytes / len(compressed)

506

max_error = np.max(np.abs(data - decoded))

507

mse = np.mean((data - decoded) ** 2)

508

509

print(f"Tolerance {tol:5.3f}: {compression_ratio:5.1f}x compression, "

510

f"max error {max_error:.3f}, MSE {mse:.6f}")

511

512

# Test different bit rates with ZFP

513

rates = [1.0, 2.0, 4.0, 8.0]

514

for rate in rates:

515

compressed = imagecodecs.zfp_encode(data, rate=rate)

516

decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)

517

518

actual_rate = len(compressed) * 8 / data.size

519

max_error = np.max(np.abs(data - decoded))

520

521

print(f"Target rate {rate:3.1f} bpv: actual {actual_rate:.1f} bpv, "

522

f"max error {max_error:.3f}")

523

```

524

525

## Performance Considerations

526

527

### Algorithm Selection

528

- **ZFP**: Best for regular grids, configurable precision/rate/tolerance

529

- **SPERR**: Optimized for 2D/3D scientific datasets, excellent compression ratios

530

- **SZ3**: High performance, good for large datasets

531

- **LERC**: Designed for raster/GIS data, wide format support

532

- **SZIP**: NASA standard, excellent for satellite/remote sensing data

533

534

### Optimization Guidelines

535

- Use floating-point predictor before general compression for smooth data

536

- Choose error tolerance based on measurement precision

537

- Consider data characteristics (smooth vs noisy, regular vs irregular)

538

- Balance compression ratio vs reconstruction speed for your use case

539

540

### Memory Management

541

- Pre-allocate output buffers for large datasets

542

- Process data in chunks for memory-constrained environments

543

- Use appropriate data types (float32 vs float64) based on precision needs

544

545

## Constants and Configuration

546

547

### ZFP Constants

548

549

```python { .api }

550

class ZFP:

551

available: bool

552

553

class EXEC:

554

SERIAL = 0

555

OMP = 1 # OpenMP parallel execution

556

CUDA = 2 # CUDA GPU execution

557

558

class MODE:

559

EXPERT = 0 # Expert mode with custom parameters

560

FIXED_RATE = 1 # Fixed bit rate mode

561

FIXED_PRECISION = 2 # Fixed precision mode

562

FIXED_ACCURACY = 3 # Fixed accuracy/tolerance mode

563

```

564

565

### SPERR Constants

566

567

```python { .api }

568

class SPERR:

569

available: bool

570

571

class MODE:

572

RATE = 'rate' # Fixed bit rate

573

PSNR = 'psnr' # Peak signal-to-noise ratio

574

PWE = 'pwe' # Point-wise error bound

575

```

576

577

## Error Handling

578

579

```python { .api }

580

class ZfpError(Exception):

581

"""ZFP codec exception."""

582

583

class SperrError(Exception):

584

"""SPERR codec exception."""

585

586

class Sz3Error(Exception):

587

"""SZ3 codec exception."""

588

589

class FloatpredError(Exception):

590

"""Floating-point predictor exception."""

591

592

class LercError(Exception):

593

"""LERC codec exception."""

594

595

class SzipError(Exception):

596

"""SZIP codec exception."""

597

598

class PcodecError(Exception):

599

"""PCODEC codec exception."""

600

```