or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-processing.mdcolor-management.mdimage-formats.mdimage-io.mdindex.mdlossless-compression.mdscientific-compression.mdutilities.md

lossless-compression.mddocs/

0

# Lossless Compression

1

2

General-purpose lossless compression algorithms optimized for different data types and use cases. These codecs provide high-performance compression without data loss, making them ideal for scientific computing, data archival, and scenarios where exact data reconstruction is required.

3

4

## Capabilities

5

6

### ZLIB/DEFLATE Compression

7

8

Industry-standard deflate compression with zlib wrapper, widely compatible and efficient for general-purpose data compression.

9

10

```python { .api }

11

def zlib_encode(data, level=None, *, out=None):

12

"""

13

Return ZLIB encoded data.

14

15

Parameters:

16

- data: bytes | bytearray | mmap.mmap - Input data to compress

17

- level: int | None - Compression level (0-9, default 6). Higher values = better compression, slower speed

18

- out: bytes | bytearray | None - Pre-allocated output buffer

19

20

Returns:

21

bytes | bytearray: ZLIB compressed data with header and checksum

22

"""

23

24

def zlib_decode(data, *, out=None):

25

"""

26

Return decoded ZLIB data.

27

28

Parameters:

29

- data: bytes | bytearray | mmap.mmap - ZLIB compressed data

30

- out: bytes | bytearray | None - Pre-allocated output buffer

31

32

Returns:

33

bytes | bytearray: Decompressed data

34

"""

35

36

def zlib_check(data):

37

"""

38

Check if data is ZLIB encoded.

39

40

Parameters:

41

- data: bytes | bytearray | mmap.mmap - Data to check

42

43

Returns:

44

bool | None: True if ZLIB format detected, None if uncertain

45

"""

46

47

def zlib_crc32(data, value=None):

48

"""

49

Return CRC32 checksum.

50

51

Parameters:

52

- data: bytes | bytearray | mmap.mmap - Data to checksum

53

- value: int | None - Initial CRC value for incremental calculation

54

55

Returns:

56

int: CRC32 checksum value

57

"""

58

59

def zlib_adler32(data, value=None):

60

"""

61

Return Adler-32 checksum.

62

63

Parameters:

64

- data: bytes | bytearray | mmap.mmap - Data to checksum

65

- value: int | None - Initial Adler-32 value for incremental calculation

66

67

Returns:

68

int: Adler-32 checksum value

69

"""

70

```

71

72

### GZIP Compression

73

74

GZIP format compression compatible with gzip command-line tool and HTTP compression.

75

76

```python { .api }

77

def gzip_encode(data, level=None, *, out=None):

78

"""

79

Return GZIP encoded data.

80

81

Parameters:

82

- data: bytes | bytearray | mmap.mmap - Input data to compress

83

- level: int | None - Compression level (0-9, default 6)

84

- out: bytes | bytearray | None - Pre-allocated output buffer

85

86

Returns:

87

bytes | bytearray: GZIP compressed data with header and trailer

88

"""

89

90

def gzip_decode(data, *, out=None):

91

"""

92

Return decoded GZIP data.

93

94

Parameters:

95

- data: bytes | bytearray | mmap.mmap - GZIP compressed data

96

- out: bytes | bytearray | None - Pre-allocated output buffer

97

98

Returns:

99

bytes | bytearray: Decompressed data

100

"""

101

102

def gzip_check(data):

103

"""

104

Check if data is GZIP encoded.

105

106

Parameters:

107

- data: bytes | bytearray | mmap.mmap - Data to check

108

109

Returns:

110

bool: True if GZIP magic number detected

111

"""

112

```

113

114

### BLOSC High-Performance Compression

115

116

Columnar storage compressor optimized for numerical data with multi-threading and multiple compression algorithms.

117

118

```python { .api }

119

def blosc_encode(data, level=None, *, compressor=None, shuffle=None, typesize=None, blocksize=None, numthreads=None, out=None):

120

"""

121

Return BLOSC encoded data.

122

123

Parameters:

124

- data: bytes | bytearray | mmap.mmap - Input data to compress

125

- level: int | None - Compression level (0-9, default 5)

126

- compressor: str | None - Compression algorithm:

127

'blosclz' (default), 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd'

128

- shuffle: int | None - Shuffle filter:

129

0 = no shuffle, 1 = byte shuffle, 2 = bit shuffle

130

- typesize: int | None - Element size in bytes for shuffle optimization

131

- blocksize: int | None - Block size in bytes (default auto-determined)

132

- numthreads: int | None - Number of threads for compression

133

- out: bytes | bytearray | None - Pre-allocated output buffer

134

135

Returns:

136

bytes | bytearray: BLOSC compressed data

137

"""

138

139

def blosc_decode(data, *, numthreads=None, out=None):

140

"""

141

Return decoded BLOSC data.

142

143

Parameters:

144

- data: bytes | bytearray | mmap.mmap - BLOSC compressed data

145

- numthreads: int | None - Number of threads for decompression

146

- out: bytes | bytearray | None - Pre-allocated output buffer

147

148

Returns:

149

bytes | bytearray: Decompressed data

150

"""

151

152

def blosc_check(data):

153

"""

154

Check if data is BLOSC encoded.

155

156

Parameters:

157

- data: bytes | bytearray | mmap.mmap - Data to check

158

159

Returns:

160

None: Always returns None (format detected by attempting decompression)

161

"""

162

```

163

164

### ZSTD (ZStandard) Compression

165

166

Modern compression algorithm providing excellent compression ratios with fast decompression speeds.

167

168

```python { .api }

169

def zstd_encode(data, level=None, *, out=None):

170

"""

171

Return ZSTD encoded data.

172

173

Parameters:

174

- data: bytes | bytearray | mmap.mmap - Input data to compress

175

- level: int | None - Compression level (1-22, default 3).

176

Higher values = better compression, slower speed

177

- out: bytes | bytearray | None - Pre-allocated output buffer

178

179

Returns:

180

bytes | bytearray: ZSTD compressed data

181

"""

182

183

def zstd_decode(data, *, out=None):

184

"""

185

Return decoded ZSTD data.

186

187

Parameters:

188

- data: bytes | bytearray | mmap.mmap - ZSTD compressed data

189

- out: bytes | bytearray | None - Pre-allocated output buffer

190

191

Returns:

192

bytes | bytearray: Decompressed data

193

"""

194

195

def zstd_check(data):

196

"""

197

Check if data is ZSTD encoded.

198

199

Parameters:

200

- data: bytes | bytearray | mmap.mmap - Data to check

201

202

Returns:

203

bool | None: True if ZSTD magic number detected

204

"""

205

```

206

207

### LZ4 Fast Compression

208

209

Ultra-fast compression algorithm optimized for speed over compression ratio.

210

211

```python { .api }

212

def lz4_encode(data, level=None, *, out=None):

213

"""

214

Return LZ4 encoded data.

215

216

Parameters:

217

- data: bytes | bytearray | mmap.mmap - Input data to compress

218

- level: int | None - Compression level (1-12, default 1).

219

Higher values = better compression, slower speed

220

- out: bytes | bytearray | None - Pre-allocated output buffer

221

222

Returns:

223

bytes | bytearray: LZ4 compressed data

224

"""

225

226

def lz4_decode(data, *, out=None):

227

"""

228

Return decoded LZ4 data.

229

230

Parameters:

231

- data: bytes | bytearray | mmap.mmap - LZ4 compressed data

232

- out: bytes | bytearray | None - Pre-allocated output buffer (size must be known)

233

234

Returns:

235

bytes | bytearray: Decompressed data

236

"""

237

238

def lz4_check(data):

239

"""

240

Check if data is LZ4 encoded.

241

242

Parameters:

243

- data: bytes | bytearray | mmap.mmap - Data to check

244

245

Returns:

246

bool | None: True if LZ4 magic number detected

247

"""

248

```

249

250

### LZ4F Frame Format

251

252

LZ4 compression with frame format that includes metadata and content checksums for safe streaming.

253

254

```python { .api }

255

def lz4f_encode(data, level=None, *, out=None):

256

"""

257

Return LZ4F (LZ4 Frame format) encoded data.

258

259

Parameters:

260

- data: bytes | bytearray | mmap.mmap - Input data to compress

261

- level: int | None - Compression level (0-12, default 0)

262

- out: bytes | bytearray | None - Pre-allocated output buffer

263

264

Returns:

265

bytes | bytearray: LZ4F compressed data with frame header and footer

266

"""

267

268

def lz4f_decode(data, *, out=None):

269

"""

270

Return decoded LZ4F data.

271

272

Parameters:

273

- data: bytes | bytearray | mmap.mmap - LZ4F compressed data

274

- out: bytes | bytearray | None - Pre-allocated output buffer

275

276

Returns:

277

bytes | bytearray: Decompressed data

278

"""

279

280

def lz4f_check(data):

281

"""

282

Check if data is LZ4F encoded.

283

284

Parameters:

285

- data: bytes | bytearray | mmap.mmap - Data to check

286

287

Returns:

288

bool | None: True if LZ4F magic number detected

289

"""

290

```

291

292

### LZMA/XZ Compression

293

294

High compression ratio algorithm used in 7-Zip and XZ utilities.

295

296

```python { .api }

297

def lzma_encode(data, level=None, *, out=None):

298

"""

299

Return LZMA encoded data.

300

301

Parameters:

302

- data: bytes | bytearray | mmap.mmap - Input data to compress

303

- level: int | None - Compression level (0-9, default 6)

304

- out: bytes | bytearray | None - Pre-allocated output buffer

305

306

Returns:

307

bytes | bytearray: LZMA compressed data

308

"""

309

310

def lzma_decode(data, *, out=None):

311

"""

312

Return decoded LZMA data.

313

314

Parameters:

315

- data: bytes | bytearray | mmap.mmap - LZMA compressed data

316

- out: bytes | bytearray | None - Pre-allocated output buffer

317

318

Returns:

319

bytes | bytearray: Decompressed data

320

"""

321

322

def lzma_check(data):

323

"""

324

Check if data is LZMA encoded.

325

326

Parameters:

327

- data: bytes | bytearray | mmap.mmap - Data to check

328

329

Returns:

330

bool | None: True if LZMA signature detected

331

"""

332

```

333

334

### BROTLI Compression

335

336

Google's compression algorithm optimized for web content and text compression.

337

338

```python { .api }

339

def brotli_encode(data, level=None, *, mode=None, lgwin=None, out=None):

340

"""

341

Return BROTLI encoded data.

342

343

Parameters:

344

- data: bytes | bytearray | mmap.mmap - Input data to compress

345

- level: int | None - Compression level (0-11, default 6)

346

- mode: int | None - Compression mode (0=generic, 1=text, 2=font)

347

- lgwin: int | None - Window size (10-24, default 22)

348

- out: bytes | bytearray | None - Pre-allocated output buffer

349

350

Returns:

351

bytes | bytearray: BROTLI compressed data

352

"""

353

354

def brotli_decode(data, *, out=None):

355

"""

356

Return decoded BROTLI data.

357

358

Parameters:

359

- data: bytes | bytearray | mmap.mmap - BROTLI compressed data

360

- out: bytes | bytearray | None - Pre-allocated output buffer

361

362

Returns:

363

bytes | bytearray: Decompressed data

364

"""

365

366

def brotli_check(data):

367

"""

368

Check if data is BROTLI encoded.

369

370

Parameters:

371

- data: bytes | bytearray | mmap.mmap - Data to check

372

373

Returns:

374

None: Always returns None (no reliable magic number)

375

"""

376

```

377

378

### SNAPPY Compression

379

380

Fast compression algorithm developed by Google for high-speed compression/decompression.

381

382

```python { .api }

383

def snappy_encode(data, *, out=None):

384

"""

385

Return SNAPPY encoded data.

386

387

Parameters:

388

- data: bytes | bytearray | mmap.mmap - Input data to compress

389

- out: bytes | bytearray | None - Pre-allocated output buffer

390

391

Returns:

392

bytes | bytearray: SNAPPY compressed data

393

"""

394

395

def snappy_decode(data, *, out=None):

396

"""

397

Return decoded SNAPPY data.

398

399

Parameters:

400

- data: bytes | bytearray | mmap.mmap - SNAPPY compressed data

401

- out: bytes | bytearray | None - Pre-allocated output buffer

402

403

Returns:

404

bytes | bytearray: Decompressed data

405

"""

406

407

def snappy_check(data):

408

"""

409

Check if data is SNAPPY encoded.

410

411

Parameters:

412

- data: bytes | bytearray | mmap.mmap - Data to check

413

414

Returns:

415

None: Always returns None (no magic number)

416

"""

417

```

418

419

## Usage Patterns

420

421

### Basic Compression

422

423

```python

424

import imagecodecs

425

import numpy as np

426

427

# Compress array data

428

data = np.random.randint(0, 256, 10000, dtype=np.uint8).tobytes()

429

430

# Try different algorithms

431

zlib_compressed = imagecodecs.zlib_encode(data, level=9)

432

zstd_compressed = imagecodecs.zstd_encode(data, level=3)

433

lz4_compressed = imagecodecs.lz4_encode(data, level=1)

434

435

print(f"Original size: {len(data)}")

436

print(f"ZLIB size: {len(zlib_compressed)} ({len(zlib_compressed)/len(data):.2%})")

437

print(f"ZSTD size: {len(zstd_compressed)} ({len(zstd_compressed)/len(data):.2%})")

438

print(f"LZ4 size: {len(lz4_compressed)} ({len(lz4_compressed)/len(data):.2%})")

439

```

440

441

### High-Performance Scientific Data

442

443

```python

444

import imagecodecs

445

import numpy as np

446

447

# Scientific array compression with BLOSC

448

data = np.random.random((1000, 1000)).astype(np.float32)

449

data_bytes = data.tobytes()

450

451

# Optimize for floating-point data

452

compressed = imagecodecs.blosc_encode(

453

data_bytes,

454

level=5,

455

compressor='zstd',

456

shuffle=1, # Byte shuffle for better compression

457

typesize=4, # float32 = 4 bytes

458

numthreads=4 # Multi-threaded compression

459

)

460

461

# Decompress with multi-threading

462

decompressed = imagecodecs.blosc_decode(compressed, numthreads=4)

463

recovered = np.frombuffer(decompressed, dtype=np.float32).reshape(1000, 1000)

464

465

assert np.array_equal(data, recovered)

466

print(f"Compression ratio: {len(compressed)/len(data_bytes):.2%}")

467

```

468

469

### Stream Processing

470

471

```python

472

import imagecodecs

473

474

# Incremental checksum calculation

475

crc = 0

476

adler = 1

477

478

data_chunks = [b"chunk1", b"chunk2", b"chunk3"]

479

for chunk in data_chunks:

480

crc = imagecodecs.zlib_crc32(chunk, crc)

481

adler = imagecodecs.zlib_adler32(chunk, adler)

482

483

print(f"Final CRC32: {crc:08x}")

484

print(f"Final Adler32: {adler:08x}")

485

```

486

487

## Constants and Configuration

488

489

### ZLIB Constants

490

491

```python { .api }

492

class ZLIB:

493

available: bool = True

494

495

class COMPRESSION:

496

NO_COMPRESSION = 0

497

BEST_SPEED = 1

498

BEST_COMPRESSION = 9

499

DEFAULT_COMPRESSION = 6

500

501

class STRATEGY:

502

DEFAULT_STRATEGY = 0

503

FILTERED = 1

504

HUFFMAN_ONLY = 2

505

RLE = 3

506

FIXED = 4

507

```

508

509

### BLOSC Constants

510

511

```python { .api }

512

class BLOSC:

513

available: bool

514

515

class SHUFFLE:

516

NOSHUFFLE = 0

517

SHUFFLE = 1

518

BITSHUFFLE = 2

519

520

class COMPRESSOR:

521

BLOSCLZ = 'blosclz'

522

LZ4 = 'lz4'

523

LZ4HC = 'lz4hc'

524

SNAPPY = 'snappy'

525

ZLIB = 'zlib'

526

ZSTD = 'zstd'

527

```

528

529

### ZSTD Constants

530

531

```python { .api }

532

class ZSTD:

533

available: bool

534

535

class STRATEGY:

536

FAST = 1

537

DFAST = 2

538

GREEDY = 3

539

LAZY = 4

540

LAZY2 = 5

541

BTLAZY2 = 6

542

BTOPT = 7

543

BTULTRA = 8

544

BTULTRA2 = 9

545

```

546

547

## Performance Guidelines

548

549

### Algorithm Selection

550

- **LZ4**: Fastest compression/decompression, moderate compression ratio

551

- **SNAPPY**: Very fast, good for real-time applications

552

- **ZLIB**: Balanced speed and compression, widely compatible

553

- **ZSTD**: Excellent compression ratio with good speed

554

- **BLOSC**: Best for numerical/scientific data with shuffle filters

555

- **BROTLI**: Best for text and web content

556

- **LZMA**: Highest compression ratio, slower speed

557

558

### Optimization Tips

559

- Use appropriate compression levels (higher = better compression, slower speed)

560

- Enable shuffle filters for BLOSC with numerical data

561

- Use multi-threading when available (BLOSC, JPEG XL, AVIF)

562

- Pre-allocate output buffers to reduce memory allocations

563

- Choose typesize parameter in BLOSC to match your data element size

564

565

### Memory Considerations

566

- Pre-allocate output buffers when processing large amounts of data

567

- Use memory-mapped input for very large files

568

- Consider streaming approaches for data larger than available RAM