0
# Lossless Compression
1
2
General-purpose lossless compression algorithms optimized for different data types and use cases. These codecs provide high-performance compression without data loss, making them ideal for scientific computing, data archival, and scenarios where exact data reconstruction is required.
3
4
## Capabilities
5
6
### ZLIB/DEFLATE Compression
7
8
Industry-standard deflate compression with zlib wrapper, widely compatible and efficient for general-purpose data compression.
9
10
```python { .api }
11
def zlib_encode(data, level=None, *, out=None):
12
"""
13
Return ZLIB encoded data.
14
15
Parameters:
16
- data: bytes | bytearray | mmap.mmap - Input data to compress
17
- level: int | None - Compression level (0-9, default 6). Higher values = better compression, slower speed
18
- out: bytes | bytearray | None - Pre-allocated output buffer
19
20
Returns:
21
bytes | bytearray: ZLIB compressed data with header and checksum
22
"""
23
24
def zlib_decode(data, *, out=None):
25
"""
26
Return decoded ZLIB data.
27
28
Parameters:
29
- data: bytes | bytearray | mmap.mmap - ZLIB compressed data
30
- out: bytes | bytearray | None - Pre-allocated output buffer
31
32
Returns:
33
bytes | bytearray: Decompressed data
34
"""
35
36
def zlib_check(data):
37
"""
38
Check if data is ZLIB encoded.
39
40
Parameters:
41
- data: bytes | bytearray | mmap.mmap - Data to check
42
43
Returns:
44
bool | None: True if ZLIB format detected, None if uncertain
45
"""
46
47
def zlib_crc32(data, value=None):
48
"""
49
Return CRC32 checksum.
50
51
Parameters:
52
- data: bytes | bytearray | mmap.mmap - Data to checksum
53
- value: int | None - Initial CRC value for incremental calculation
54
55
Returns:
56
int: CRC32 checksum value
57
"""
58
59
def zlib_adler32(data, value=None):
60
"""
61
Return Adler-32 checksum.
62
63
Parameters:
64
- data: bytes | bytearray | mmap.mmap - Data to checksum
65
- value: int | None - Initial Adler-32 value for incremental calculation
66
67
Returns:
68
int: Adler-32 checksum value
69
"""
70
```
71
72
### GZIP Compression
73
74
GZIP format compression compatible with gzip command-line tool and HTTP compression.
75
76
```python { .api }
77
def gzip_encode(data, level=None, *, out=None):
78
"""
79
Return GZIP encoded data.
80
81
Parameters:
82
- data: bytes | bytearray | mmap.mmap - Input data to compress
83
- level: int | None - Compression level (0-9, default 6)
84
- out: bytes | bytearray | None - Pre-allocated output buffer
85
86
Returns:
87
bytes | bytearray: GZIP compressed data with header and trailer
88
"""
89
90
def gzip_decode(data, *, out=None):
91
"""
92
Return decoded GZIP data.
93
94
Parameters:
95
- data: bytes | bytearray | mmap.mmap - GZIP compressed data
96
- out: bytes | bytearray | None - Pre-allocated output buffer
97
98
Returns:
99
bytes | bytearray: Decompressed data
100
"""
101
102
def gzip_check(data):
103
"""
104
Check if data is GZIP encoded.
105
106
Parameters:
107
- data: bytes | bytearray | mmap.mmap - Data to check
108
109
Returns:
110
bool: True if GZIP magic number detected
111
"""
112
```
113
114
### BLOSC High-Performance Compression
115
116
Columnar storage compressor optimized for numerical data with multi-threading and multiple compression algorithms.
117
118
```python { .api }
119
def blosc_encode(data, level=None, *, compressor=None, shuffle=None, typesize=None, blocksize=None, numthreads=None, out=None):
120
"""
121
Return BLOSC encoded data.
122
123
Parameters:
124
- data: bytes | bytearray | mmap.mmap - Input data to compress
125
- level: int | None - Compression level (0-9, default 5)
126
- compressor: str | None - Compression algorithm:
127
'blosclz' (default), 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd'
128
- shuffle: int | None - Shuffle filter:
129
0 = no shuffle, 1 = byte shuffle, 2 = bit shuffle
130
- typesize: int | None - Element size in bytes for shuffle optimization
131
- blocksize: int | None - Block size in bytes (default auto-determined)
132
- numthreads: int | None - Number of threads for compression
133
- out: bytes | bytearray | None - Pre-allocated output buffer
134
135
Returns:
136
bytes | bytearray: BLOSC compressed data
137
"""
138
139
def blosc_decode(data, *, numthreads=None, out=None):
140
"""
141
Return decoded BLOSC data.
142
143
Parameters:
144
- data: bytes | bytearray | mmap.mmap - BLOSC compressed data
145
- numthreads: int | None - Number of threads for decompression
146
- out: bytes | bytearray | None - Pre-allocated output buffer
147
148
Returns:
149
bytes | bytearray: Decompressed data
150
"""
151
152
def blosc_check(data):
153
"""
154
Check if data is BLOSC encoded.
155
156
Parameters:
157
- data: bytes | bytearray | mmap.mmap - Data to check
158
159
Returns:
160
None: Always returns None (format detected by attempting decompression)
161
"""
162
```
163
164
### ZSTD (ZStandard) Compression
165
166
Modern compression algorithm providing excellent compression ratios with fast decompression speeds.
167
168
```python { .api }
169
def zstd_encode(data, level=None, *, out=None):
170
"""
171
Return ZSTD encoded data.
172
173
Parameters:
174
- data: bytes | bytearray | mmap.mmap - Input data to compress
175
- level: int | None - Compression level (1-22, default 3).
176
Higher values = better compression, slower speed
177
- out: bytes | bytearray | None - Pre-allocated output buffer
178
179
Returns:
180
bytes | bytearray: ZSTD compressed data
181
"""
182
183
def zstd_decode(data, *, out=None):
184
"""
185
Return decoded ZSTD data.
186
187
Parameters:
188
- data: bytes | bytearray | mmap.mmap - ZSTD compressed data
189
- out: bytes | bytearray | None - Pre-allocated output buffer
190
191
Returns:
192
bytes | bytearray: Decompressed data
193
"""
194
195
def zstd_check(data):
196
"""
197
Check if data is ZSTD encoded.
198
199
Parameters:
200
- data: bytes | bytearray | mmap.mmap - Data to check
201
202
Returns:
203
bool | None: True if ZSTD magic number detected
204
"""
205
```
206
207
### LZ4 Fast Compression
208
209
Ultra-fast compression algorithm optimized for speed over compression ratio.
210
211
```python { .api }
212
def lz4_encode(data, level=None, *, out=None):
213
"""
214
Return LZ4 encoded data.
215
216
Parameters:
217
- data: bytes | bytearray | mmap.mmap - Input data to compress
218
- level: int | None - Compression level (1-12, default 1).
219
Higher values = better compression, slower speed
220
- out: bytes | bytearray | None - Pre-allocated output buffer
221
222
Returns:
223
bytes | bytearray: LZ4 compressed data
224
"""
225
226
def lz4_decode(data, *, out=None):
227
"""
228
Return decoded LZ4 data.
229
230
Parameters:
231
- data: bytes | bytearray | mmap.mmap - LZ4 compressed data
232
- out: bytes | bytearray | None - Pre-allocated output buffer (size must be known)
233
234
Returns:
235
bytes | bytearray: Decompressed data
236
"""
237
238
def lz4_check(data):
239
"""
240
Check if data is LZ4 encoded.
241
242
Parameters:
243
- data: bytes | bytearray | mmap.mmap - Data to check
244
245
Returns:
246
bool | None: True if LZ4 magic number detected
247
"""
248
```
249
250
### LZ4F Frame Format
251
252
LZ4 compression with frame format that includes metadata and content checksums for safe streaming.
253
254
```python { .api }
255
def lz4f_encode(data, level=None, *, out=None):
256
"""
257
Return LZ4F (LZ4 Frame format) encoded data.
258
259
Parameters:
260
- data: bytes | bytearray | mmap.mmap - Input data to compress
261
- level: int | None - Compression level (0-12, default 0)
262
- out: bytes | bytearray | None - Pre-allocated output buffer
263
264
Returns:
265
bytes | bytearray: LZ4F compressed data with frame header and footer
266
"""
267
268
def lz4f_decode(data, *, out=None):
269
"""
270
Return decoded LZ4F data.
271
272
Parameters:
273
- data: bytes | bytearray | mmap.mmap - LZ4F compressed data
274
- out: bytes | bytearray | None - Pre-allocated output buffer
275
276
Returns:
277
bytes | bytearray: Decompressed data
278
"""
279
280
def lz4f_check(data):
281
"""
282
Check if data is LZ4F encoded.
283
284
Parameters:
285
- data: bytes | bytearray | mmap.mmap - Data to check
286
287
Returns:
288
bool | None: True if LZ4F magic number detected
289
"""
290
```
291
292
### LZMA/XZ Compression
293
294
High compression ratio algorithm used in 7-Zip and XZ utilities.
295
296
```python { .api }
297
def lzma_encode(data, level=None, *, out=None):
298
"""
299
Return LZMA encoded data.
300
301
Parameters:
302
- data: bytes | bytearray | mmap.mmap - Input data to compress
303
- level: int | None - Compression level (0-9, default 6)
304
- out: bytes | bytearray | None - Pre-allocated output buffer
305
306
Returns:
307
bytes | bytearray: LZMA compressed data
308
"""
309
310
def lzma_decode(data, *, out=None):
311
"""
312
Return decoded LZMA data.
313
314
Parameters:
315
- data: bytes | bytearray | mmap.mmap - LZMA compressed data
316
- out: bytes | bytearray | None - Pre-allocated output buffer
317
318
Returns:
319
bytes | bytearray: Decompressed data
320
"""
321
322
def lzma_check(data):
323
"""
324
Check if data is LZMA encoded.
325
326
Parameters:
327
- data: bytes | bytearray | mmap.mmap - Data to check
328
329
Returns:
330
bool | None: True if LZMA signature detected
331
"""
332
```
333
334
### BROTLI Compression
335
336
Google's compression algorithm optimized for web content and text compression.
337
338
```python { .api }
339
def brotli_encode(data, level=None, *, mode=None, lgwin=None, out=None):
340
"""
341
Return BROTLI encoded data.
342
343
Parameters:
344
- data: bytes | bytearray | mmap.mmap - Input data to compress
345
- level: int | None - Compression level (0-11, default 6)
346
- mode: int | None - Compression mode (0=generic, 1=text, 2=font)
347
- lgwin: int | None - Window size (10-24, default 22)
348
- out: bytes | bytearray | None - Pre-allocated output buffer
349
350
Returns:
351
bytes | bytearray: BROTLI compressed data
352
"""
353
354
def brotli_decode(data, *, out=None):
355
"""
356
Return decoded BROTLI data.
357
358
Parameters:
359
- data: bytes | bytearray | mmap.mmap - BROTLI compressed data
360
- out: bytes | bytearray | None - Pre-allocated output buffer
361
362
Returns:
363
bytes | bytearray: Decompressed data
364
"""
365
366
def brotli_check(data):
367
"""
368
Check if data is BROTLI encoded.
369
370
Parameters:
371
- data: bytes | bytearray | mmap.mmap - Data to check
372
373
Returns:
374
None: Always returns None (no reliable magic number)
375
"""
376
```
377
378
### SNAPPY Compression
379
380
Fast compression algorithm developed by Google for high-speed compression/decompression.
381
382
```python { .api }
383
def snappy_encode(data, *, out=None):
384
"""
385
Return SNAPPY encoded data.
386
387
Parameters:
388
- data: bytes | bytearray | mmap.mmap - Input data to compress
389
- out: bytes | bytearray | None - Pre-allocated output buffer
390
391
Returns:
392
bytes | bytearray: SNAPPY compressed data
393
"""
394
395
def snappy_decode(data, *, out=None):
396
"""
397
Return decoded SNAPPY data.
398
399
Parameters:
400
- data: bytes | bytearray | mmap.mmap - SNAPPY compressed data
401
- out: bytes | bytearray | None - Pre-allocated output buffer
402
403
Returns:
404
bytes | bytearray: Decompressed data
405
"""
406
407
def snappy_check(data):
408
"""
409
Check if data is SNAPPY encoded.
410
411
Parameters:
412
- data: bytes | bytearray | mmap.mmap - Data to check
413
414
Returns:
415
None: Always returns None (no magic number)
416
"""
417
```
418
419
## Usage Patterns
420
421
### Basic Compression
422
423
```python
424
import imagecodecs
425
import numpy as np
426
427
# Compress array data
428
data = np.random.randint(0, 256, 10000, dtype=np.uint8).tobytes()
429
430
# Try different algorithms
431
zlib_compressed = imagecodecs.zlib_encode(data, level=9)
432
zstd_compressed = imagecodecs.zstd_encode(data, level=3)
433
lz4_compressed = imagecodecs.lz4_encode(data, level=1)
434
435
print(f"Original size: {len(data)}")
436
print(f"ZLIB size: {len(zlib_compressed)} ({len(zlib_compressed)/len(data):.2%})")
437
print(f"ZSTD size: {len(zstd_compressed)} ({len(zstd_compressed)/len(data):.2%})")
438
print(f"LZ4 size: {len(lz4_compressed)} ({len(lz4_compressed)/len(data):.2%})")
439
```
440
441
### High-Performance Scientific Data
442
443
```python
444
import imagecodecs
445
import numpy as np
446
447
# Scientific array compression with BLOSC
448
data = np.random.random((1000, 1000)).astype(np.float32)
449
data_bytes = data.tobytes()
450
451
# Optimize for floating-point data
452
compressed = imagecodecs.blosc_encode(
453
data_bytes,
454
level=5,
455
compressor='zstd',
456
shuffle=1, # Byte shuffle for better compression
457
typesize=4, # float32 = 4 bytes
458
numthreads=4 # Multi-threaded compression
459
)
460
461
# Decompress with multi-threading
462
decompressed = imagecodecs.blosc_decode(compressed, numthreads=4)
463
recovered = np.frombuffer(decompressed, dtype=np.float32).reshape(1000, 1000)
464
465
assert np.array_equal(data, recovered)
466
print(f"Compression ratio: {len(compressed)/len(data_bytes):.2%}")
467
```
468
469
### Stream Processing
470
471
```python
472
import imagecodecs
473
474
# Incremental checksum calculation
475
crc = 0
476
adler = 1
477
478
data_chunks = [b"chunk1", b"chunk2", b"chunk3"]
479
for chunk in data_chunks:
480
crc = imagecodecs.zlib_crc32(chunk, crc)
481
adler = imagecodecs.zlib_adler32(chunk, adler)
482
483
print(f"Final CRC32: {crc:08x}")
484
print(f"Final Adler32: {adler:08x}")
485
```
486
487
## Constants and Configuration
488
489
### ZLIB Constants
490
491
```python { .api }
492
class ZLIB:
493
available: bool = True
494
495
class COMPRESSION:
496
NO_COMPRESSION = 0
497
BEST_SPEED = 1
498
BEST_COMPRESSION = 9
499
DEFAULT_COMPRESSION = 6
500
501
class STRATEGY:
502
DEFAULT_STRATEGY = 0
503
FILTERED = 1
504
HUFFMAN_ONLY = 2
505
RLE = 3
506
FIXED = 4
507
```
508
509
### BLOSC Constants
510
511
```python { .api }
512
class BLOSC:
513
available: bool
514
515
class SHUFFLE:
516
NOSHUFFLE = 0
517
SHUFFLE = 1
518
BITSHUFFLE = 2
519
520
class COMPRESSOR:
521
BLOSCLZ = 'blosclz'
522
LZ4 = 'lz4'
523
LZ4HC = 'lz4hc'
524
SNAPPY = 'snappy'
525
ZLIB = 'zlib'
526
ZSTD = 'zstd'
527
```
528
529
### ZSTD Constants
530
531
```python { .api }
532
class ZSTD:
533
available: bool
534
535
class STRATEGY:
536
FAST = 1
537
DFAST = 2
538
GREEDY = 3
539
LAZY = 4
540
LAZY2 = 5
541
BTLAZY2 = 6
542
BTOPT = 7
543
BTULTRA = 8
544
BTULTRA2 = 9
545
```
546
547
## Performance Guidelines
548
549
### Algorithm Selection
550
- **LZ4**: Fastest compression/decompression, moderate compression ratio
551
- **SNAPPY**: Very fast, good for real-time applications
552
- **ZLIB**: Balanced speed and compression, widely compatible
553
- **ZSTD**: Excellent compression ratio with good speed
554
- **BLOSC**: Best for numerical/scientific data with shuffle filters
555
- **BROTLI**: Best for text and web content
556
- **LZMA**: Highest compression ratio, slower speed
557
558
### Optimization Tips
559
- Use appropriate compression levels (higher = better compression, slower speed)
560
- Enable shuffle filters for BLOSC with numerical data
561
- Use multi-threading when available (BLOSC, JPEG XL, AVIF)
562
- Pre-allocate output buffers to reduce memory allocations
563
- Choose typesize parameter in BLOSC to match your data element size
564
565
### Memory Considerations
566
- Pre-allocate output buffers when processing large amounts of data
567
- Use memory-mapped input for very large files
568
- Consider streaming approaches for data larger than available RAM