CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-imagecodecs

Image transformation, compression, and decompression codecs for scientific computing

Pending
Overview
Eval results
Files

scientific-compression.mddocs/

Scientific Data Compression

Specialized codecs optimized for scientific computing, including floating-point data compression, error-bounded compression, and array processing utilities. These algorithms are designed for numerical accuracy, performance, and specific scientific data characteristics.

Capabilities

ZFP Floating-Point Compression

Compressed floating-point arrays with configurable precision, rate, or error tolerance for scientific datasets.

def zfp_encode(data, *, rate=None, precision=None, tolerance=None, out=None):
    """
    Return ZFP encoded floating-point array.
    
    Parameters:
    - data: NDArray - Floating-point array to compress (1D-4D, float32/float64)
    - rate: float | None - Target compression rate in bits per value
    - precision: int | None - Number of bit planes to encode (lossless if sufficient)
    - tolerance: float | None - Absolute error tolerance (error-bounded mode)
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: ZFP compressed data
    
    Note: Exactly one of rate, precision, or tolerance must be specified
    """

def zfp_decode(data, shape=None, dtype=None, *, out=None):
    """
    Return decoded ZFP floating-point array.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - ZFP compressed data
    - shape: tuple | None - Output array shape (required)
    - dtype: numpy.dtype | None - Output data type (float32 or float64, required)
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded floating-point array
    """

def zfp_check(data):
    """
    Check if data is ZFP encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if ZFP header detected
    """

SPERR Scientific Compression

Error-bounded lossy compressor optimized for scientific floating-point data with multiple quality modes.

def sperr_encode(data, *, mode=None, quality=None, tolerance=None, out=None):
    """
    Return SPERR encoded floating-point data.
    
    Parameters:
    - data: NDArray - Floating-point data to compress (2D/3D, float32/float64)
    - mode: str | None - Compression mode:
        'rate' = fixed bit rate, 'psnr' = peak signal-to-noise ratio, 'pwe' = point-wise error
    - quality: float | None - Quality parameter for chosen mode:
        For 'rate': bits per pixel (e.g., 1.0-16.0)
        For 'psnr': target PSNR in dB (e.g., 40.0-80.0)  
        For 'pwe': maximum point-wise error
    - tolerance: float | None - Alternative way to specify error tolerance
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: SPERR compressed data
    """

def sperr_decode(data, *, out=None):
    """
    Return decoded SPERR floating-point data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - SPERR compressed data
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded floating-point array
    """

def sperr_check(data):
    """
    Check if data is SPERR encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if SPERR signature detected
    """

SZ3 Error-Bounded Compression

High-performance error-bounded lossy compressor for scientific datasets with excellent compression ratios.

def sz3_encode(data, *, tolerance=None, out=None):
    """
    Return SZ3 encoded floating-point data.
    
    Parameters:
    - data: NDArray - Floating-point data to compress (float32/float64)
    - tolerance: float | None - Absolute error bound (required)
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: SZ3 compressed data
    """

def sz3_decode(data, shape=None, dtype=None, *, out=None):
    """
    Return decoded SZ3 floating-point data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - SZ3 compressed data
    - shape: tuple | None - Output array shape (required)
    - dtype: numpy.dtype | None - Output data type (required)
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded floating-point array
    """

def sz3_check(data):
    """
    Check if data is SZ3 encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if SZ3 signature detected
    """

Floating-Point Predictor

Preprocessing filter that improves compression by removing predictable patterns in floating-point data.

def floatpred_encode(data, *, axis=-1, dist=1, out=None):
    """
    Return floating-point predictor encoded data.
    
    Parameters:
    - data: NDArray - Floating-point data to encode (float32/float64)
    - axis: int - Axis along which to apply predictor (default -1)
    - dist: int - Predictor distance (default 1)
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Predictor encoded data (same shape and dtype as input)
    """

def floatpred_decode(data, *, axis=-1, dist=1, out=None):
    """
    Return floating-point predictor decoded data.
    
    Parameters:
    - data: NDArray - Predictor encoded data
    - axis: int - Axis along which predictor was applied (default -1)
    - dist: int - Predictor distance used (default 1)
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded floating-point data
    """

def floatpred_check(data):
    """
    Check if data is floating-point predictor encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap | NDArray - Data to check
    
    Returns:
    None: Always returns None (predictor is a transform, not a format)
    """

JETRAW Scientific Image Compression

High-performance lossless compression specifically optimized for scientific image data including X-ray, microscopy, and other detector data.

def jetraw_encode(data, *, identifier=None, out=None):
    """
    Return JETRAW encoded image data.
    
    Parameters:
    - data: NDArray - Image data to compress (typically uint16 detector data)
    - identifier: str | None - Optional identifier string
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: JETRAW compressed data
    """

def jetraw_decode(data, *, out=None):
    """
    Return decoded JETRAW image data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - JETRAW compressed data
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded image data
    """

def jetraw_check(data):
    """
    Check if data is JETRAW encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if JETRAW signature detected
    """

LERC Limited Error Raster Compression

Lossy/lossless compression specifically designed for raster data with configurable error bounds.

def lerc_encode(data, *, tolerance=None, version=None, out=None):
    """
    Return LERC encoded raster data.
    
    Parameters:
    - data: NDArray - Raster data to compress (integer or floating-point)
    - tolerance: float | None - Maximum error tolerance (0.0 for lossless)
    - version: int | None - LERC version (2 or 4, default 4)
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: LERC compressed data
    """

def lerc_decode(data, *, out=None):
    """
    Return decoded LERC raster data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - LERC compressed data
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded raster array
    """

def lerc_check(data):
    """
    Check if data is LERC encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if LERC signature detected
    """

SZIP Scientific Data Compression

NASA's adaptive entropy encoder designed for scientific datasets, particularly satellite and remote sensing data.

def szip_encode(data, *, coding=None, pixels_per_block=None, bits_per_pixel=None, out=None):
    """
    Return SZIP encoded scientific data.
    
    Parameters:
    - data: NDArray - Scientific data to compress (integer types)
    - coding: str | None - Coding method ('ec' for entropy coding, 'nn' for nearest neighbor)
    - pixels_per_block: int | None - Pixels per compression block (8, 16, 32)
    - bits_per_pixel: int | None - Bits per pixel in input data
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: SZIP compressed data
    """

def szip_decode(data, *, out=None):
    """
    Return decoded SZIP scientific data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - SZIP compressed data
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded scientific data array
    """

def szip_check(data):
    """
    Check if data is SZIP encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if SZIP signature detected
    """

PCODEC Parquet Codec

Compression codec designed for columnar data formats, optimized for analytical workloads.

def pcodec_encode(data, *, level=None, out=None):
    """
    Return PCODEC encoded columnar data.
    
    Parameters:
    - data: NDArray - Columnar data to compress
    - level: int | None - Compression level (0-12, default 8)
    - out: bytes | bytearray | None - Pre-allocated output buffer
    
    Returns:
    bytes | bytearray: PCODEC compressed data
    """

def pcodec_decode(data, *, out=None):
    """
    Return decoded PCODEC columnar data.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - PCODEC compressed data
    - out: NDArray | None - Pre-allocated output buffer
    
    Returns:
    NDArray: Decoded columnar data array
    """

def pcodec_check(data):
    """
    Check if data is PCODEC encoded.
    
    Parameters:
    - data: bytes | bytearray | mmap.mmap - Data to check
    
    Returns:
    bool | None: True if PCODEC signature detected
    """

Usage Examples

Climate Data Compression

import imagecodecs
import numpy as np

# Simulate climate model output (temperature data)
time_steps, lat, lon = 365, 180, 360
temperature = np.random.normal(15.0, 20.0, (time_steps, lat, lon)).astype(np.float32)

# Error-bounded compression with 0.1°C tolerance
zfp_compressed = imagecodecs.zfp_encode(temperature, tolerance=0.1)
zfp_decoded = imagecodecs.zfp_decode(
    zfp_compressed, 
    shape=temperature.shape, 
    dtype=temperature.dtype
)

# Verify error bound
max_error = np.max(np.abs(temperature - zfp_decoded))
print(f"Max error: {max_error:.3f}°C (tolerance: 0.1°C)")
print(f"Compression ratio: {temperature.nbytes / len(zfp_compressed):.1f}x")

# Alternative with SPERR
sperr_compressed = imagecodecs.sperr_encode(
    temperature, 
    mode='pwe', 
    quality=0.1  # 0.1°C point-wise error
)
sperr_decoded = imagecodecs.sperr_decode(sperr_compressed)

Medical Imaging Data

import imagecodecs
import numpy as np

# Simulate 3D medical scan (CT or MRI)
scan = np.random.randint(0, 4096, (256, 256, 128), dtype=np.uint16)

# Lossless compression with LERC
lerc_lossless = imagecodecs.lerc_encode(scan, tolerance=0.0)
lerc_decoded = imagecodecs.lerc_decode(lerc_lossless)
assert np.array_equal(scan, lerc_decoded)

# Near-lossless with small tolerance
lerc_lossy = imagecodecs.lerc_encode(scan, tolerance=1.0)  # 1 HU tolerance
lerc_lossy_decoded = imagecodecs.lerc_decode(lerc_lossy)

print(f"Original size: {scan.nbytes} bytes")
print(f"Lossless LERC: {len(lerc_lossless)} bytes ({len(lerc_lossless)/scan.nbytes:.2%})")
print(f"Lossy LERC: {len(lerc_lossy)} bytes ({len(lerc_lossy)/scan.nbytes:.2%})")

Satellite Data Processing

import imagecodecs
import numpy as np

# Simulate satellite imagery (multispectral)
bands, height, width = 8, 1024, 1024
satellite_data = np.random.randint(0, 65535, (bands, height, width), dtype=np.uint16)

# SZIP compression optimized for remote sensing
compressed_bands = []
for band in satellite_data:
    compressed = imagecodecs.szip_encode(
        band,
        coding='ec',  # Entropy coding
        pixels_per_block=16,
        bits_per_pixel=16
    )
    compressed_bands.append(compressed)

# Calculate total compression
original_size = satellite_data.nbytes
compressed_size = sum(len(band) for band in compressed_bands)
print(f"SZIP compression ratio: {original_size / compressed_size:.1f}x")

# Decode bands
decoded_bands = []
for compressed in compressed_bands:
    decoded = imagecodecs.szip_decode(compressed)
    decoded_bands.append(decoded)

reconstructed = np.stack(decoded_bands)
assert np.array_equal(satellite_data, reconstructed)

Floating-Point Predictor Usage

import imagecodecs
import numpy as np

# Scientific simulation data with smooth gradients
x = np.linspace(0, 10, 1000)
y = np.linspace(0, 10, 1000)
X, Y = np.meshgrid(x, y)
field = np.sin(X) * np.cos(Y) + 0.1 * np.random.random((1000, 1000))
field = field.astype(np.float32)

# Apply floating-point predictor before compression
predicted = imagecodecs.floatpred_encode(field, axis=1)  # Predict along rows

# Compress the predicted data
compressed = imagecodecs.zlib_encode(predicted.tobytes(), level=9)

# Compare with direct compression
direct_compressed = imagecodecs.zlib_encode(field.tobytes(), level=9)

print(f"Direct compression: {len(direct_compressed)} bytes")
print(f"With predictor: {len(compressed)} bytes")
print(f"Improvement: {len(direct_compressed) / len(compressed):.1f}x")

# Decompress and decode
decompressed_bytes = imagecodecs.zlib_decode(compressed)
predicted_restored = np.frombuffer(decompressed_bytes, dtype=np.float32).reshape(field.shape)
field_restored = imagecodecs.floatpred_decode(predicted_restored, axis=1)

# Verify exact reconstruction (lossless)
assert np.array_equal(field, field_restored)

Quality vs Compression Trade-offs

import imagecodecs
import numpy as np

# Generate test scientific dataset
data = np.random.exponential(2.0, (512, 512, 64)).astype(np.float32)

# Test different error tolerances with ZFP
tolerances = [0.001, 0.01, 0.1, 1.0]
for tol in tolerances:
    compressed = imagecodecs.zfp_encode(data, tolerance=tol)
    decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)
    
    compression_ratio = data.nbytes / len(compressed)
    max_error = np.max(np.abs(data - decoded))
    mse = np.mean((data - decoded) ** 2)
    
    print(f"Tolerance {tol:5.3f}: {compression_ratio:5.1f}x compression, "
          f"max error {max_error:.3f}, MSE {mse:.6f}")

# Test different bit rates with ZFP  
rates = [1.0, 2.0, 4.0, 8.0]
for rate in rates:
    compressed = imagecodecs.zfp_encode(data, rate=rate)
    decoded = imagecodecs.zfp_decode(compressed, shape=data.shape, dtype=data.dtype)
    
    actual_rate = len(compressed) * 8 / data.size
    max_error = np.max(np.abs(data - decoded))
    
    print(f"Target rate {rate:3.1f} bpv: actual {actual_rate:.1f} bpv, "
          f"max error {max_error:.3f}")

Performance Considerations

Algorithm Selection

  • ZFP: Best for regular grids, configurable precision/rate/tolerance
  • SPERR: Optimized for 2D/3D scientific datasets, excellent compression ratios
  • SZ3: High performance, good for large datasets
  • LERC: Designed for raster/GIS data, wide format support
  • SZIP: NASA standard, excellent for satellite/remote sensing data

Optimization Guidelines

  • Use floating-point predictor before general compression for smooth data
  • Choose error tolerance based on measurement precision
  • Consider data characteristics (smooth vs noisy, regular vs irregular)
  • Balance compression ratio vs reconstruction speed for your use case

Memory Management

  • Pre-allocate output buffers for large datasets
  • Process data in chunks for memory-constrained environments
  • Use appropriate data types (float32 vs float64) based on precision needs

Constants and Configuration

ZFP Constants

class ZFP:
    available: bool
    
    class EXEC:
        SERIAL = 0
        OMP = 1     # OpenMP parallel execution
        CUDA = 2    # CUDA GPU execution
    
    class MODE:
        EXPERT = 0      # Expert mode with custom parameters
        FIXED_RATE = 1  # Fixed bit rate mode
        FIXED_PRECISION = 2  # Fixed precision mode  
        FIXED_ACCURACY = 3   # Fixed accuracy/tolerance mode

SPERR Constants

class SPERR:
    available: bool
    
    class MODE:
        RATE = 'rate'   # Fixed bit rate
        PSNR = 'psnr'   # Peak signal-to-noise ratio  
        PWE = 'pwe'     # Point-wise error bound

Error Handling

class ZfpError(Exception):
    """ZFP codec exception."""

class SperrError(Exception):
    """SPERR codec exception."""

class Sz3Error(Exception):
    """SZ3 codec exception."""

class FloatpredError(Exception):
    """Floating-point predictor exception."""

class LercError(Exception):
    """LERC codec exception."""

class SzipError(Exception):
    """SZIP codec exception."""

class PcodecError(Exception):
    """PCODEC codec exception."""

Install with Tessl CLI

npx tessl i tessl/pypi-imagecodecs

docs

array-processing.md

color-management.md

image-formats.md

image-io.md

index.md

lossless-compression.md

scientific-compression.md

utilities.md

tile.json