High-performance compression library wrapper for binary and numerical data with multiple algorithms and shuffle filters
npx @tessl/cli install tessl/pypi-blosc@1.11.00
# Blosc
1
2
A high-performance compression library wrapper providing Python bindings for the Blosc compression library. Optimized for compressing binary and numerical data with multiple compression algorithms (blosclz, lz4, lz4hc, snappy, zlib, zstd) and configurable shuffling filters for optimal performance on time series, sparse data, and regular-spaced numerical arrays.
3
4
## Package Information
5
6
- **Package Name**: blosc
7
- **Language**: Python
8
- **Installation**: `pip install blosc`
9
- **Supported Python**: 3.9+
10
11
## Core Imports
12
13
```python
14
import blosc
15
```
16
17
## Basic Usage
18
19
```python
20
import blosc
21
import array
22
23
# Basic compression and decompression
24
data = b'0123456789' * 1000
25
compressed = blosc.compress(data, typesize=1)
26
decompressed = blosc.decompress(compressed)
27
28
# Working with numerical arrays
29
a = array.array('i', range(1000000))
30
a_bytes = a.tobytes()
31
compressed_array = blosc.compress(a_bytes, typesize=4, cname='lz4')
32
decompressed_array = blosc.decompress(compressed_array)
33
34
# Configuration
35
blosc.set_nthreads(4) # Use 4 threads
36
blosc.set_blocksize(0) # Automatic blocksize
37
38
# Get compression information
39
nbytes, cbytes, blocksize = blosc.get_cbuffer_sizes(compressed)
40
clib = blosc.get_clib(compressed)
41
```
42
43
## Capabilities
44
45
### Core Compression Functions
46
47
Primary compression and decompression operations supporting bytes-like objects with configurable compression parameters.
48
49
```python { .api }
50
def compress(bytesobj, typesize=8, clevel=9, shuffle=blosc.SHUFFLE, cname='blosclz'):
51
"""
52
Compress bytesobj with specified parameters.
53
54
Parameters:
55
- bytesobj: bytes-like object supporting buffer interface
56
- typesize: int, data type size (1-255)
57
- clevel: int, compression level 0-9 (0=no compression, 9=max)
58
- shuffle: int, shuffle filter (NOSHUFFLE, SHUFFLE, BITSHUFFLE)
59
- cname: str, compressor name ('blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd')
60
61
Returns:
62
bytes: Compressed data
63
64
Raises:
65
TypeError: If bytesobj doesn't support buffer interface
66
ValueError: If parameters out of range or cname invalid
67
"""
68
69
def decompress(bytes_like, as_bytearray=False):
70
"""
71
Decompress bytes-like compressed object.
72
73
Parameters:
74
- bytes_like: bytes-like object with compressed data
75
- as_bytearray: bool, return bytearray instead of bytes
76
77
Returns:
78
bytes or bytearray: Decompressed data
79
80
Raises:
81
TypeError: If bytes_like doesn't support buffer protocol
82
"""
83
```
84
85
### Memory Pointer Functions
86
87
Low-level compression and decompression using memory addresses for integration with NumPy arrays and ctypes.
88
89
```python { .api }
90
def compress_ptr(address, items, typesize=8, clevel=9, shuffle=blosc.SHUFFLE, cname='blosclz'):
91
"""
92
Compress data at memory address.
93
94
Parameters:
95
- address: int, memory pointer to data
96
- items: int, number of items of typesize to compress
97
- typesize: int, size of each data item
98
- clevel: int, compression level 0-9
99
- shuffle: int, shuffle filter
100
- cname: str, compressor name
101
102
Returns:
103
bytes: Compressed data
104
105
Raises:
106
TypeError: If address not int
107
ValueError: If items negative or total size exceeds limits
108
"""
109
110
def decompress_ptr(bytes_like, address):
111
"""
112
Decompress data directly into memory address.
113
114
Parameters:
115
- bytes_like: bytes-like object with compressed data
116
- address: int, memory pointer where to write decompressed data
117
118
Returns:
119
int: Number of bytes written
120
121
Raises:
122
TypeError: If address not int or bytes_like invalid
123
"""
124
```
125
126
### NumPy Array Functions
127
128
High-level functions for compressing and decompressing NumPy arrays using pickle serialization.
129
130
```python { .api }
131
def pack_array(array, clevel=9, shuffle=blosc.SHUFFLE, cname='blosclz'):
132
"""
133
Pack (compress) a NumPy array.
134
135
Parameters:
136
- array: ndarray, NumPy array to compress
137
- clevel: int, compression level 0-9
138
- shuffle: int, shuffle filter
139
- cname: str, compressor name
140
141
Returns:
142
bytes: Packed array data
143
144
Raises:
145
TypeError: If array doesn't have dtype and shape attributes
146
ValueError: If array size exceeds limits or parameters invalid
147
"""
148
149
def unpack_array(packed_array, **kwargs):
150
"""
151
Unpack (decompress) a packed NumPy array.
152
153
Parameters:
154
- packed_array: bytes, packed array data
155
- **kwargs: Additional parameters for pickle.loads
156
157
Returns:
158
ndarray: Decompressed NumPy array
159
160
Raises:
161
TypeError: If packed_array not bytes
162
"""
163
```
164
165
### Buffer Information Functions
166
167
Functions to inspect compressed buffer properties and validate compressed data.
168
169
```python { .api }
170
def get_cbuffer_sizes(bytesobj):
171
"""
172
Get information about compressed buffer.
173
174
Parameters:
175
- bytesobj: bytes, compressed buffer
176
177
Returns:
178
tuple: (uncompressed_bytes, compressed_bytes, blocksize)
179
"""
180
181
def cbuffer_validate(bytesobj):
182
"""
183
Validate compressed buffer safety.
184
185
Parameters:
186
- bytesobj: bytes, compressed buffer to validate
187
188
Returns:
189
bool: True if buffer is safe to decompress
190
"""
191
192
def get_clib(bytesobj):
193
"""
194
Get compression library name from compressed buffer.
195
196
Parameters:
197
- bytesobj: bytes, compressed buffer
198
199
Returns:
200
str: Name of compression library used
201
"""
202
```
203
204
### Configuration Functions
205
206
Functions to configure Blosc behavior including threading and block sizes.
207
208
```python { .api }
209
def set_nthreads(nthreads):
210
"""
211
Set number of threads for Blosc operations.
212
213
Parameters:
214
- nthreads: int, number of threads (1 to MAX_THREADS)
215
216
Returns:
217
int: Previous number of threads
218
219
Raises:
220
ValueError: If nthreads exceeds MAX_THREADS
221
"""
222
223
def set_blocksize(blocksize):
224
"""
225
Force specific blocksize (0 for automatic).
226
227
Parameters:
228
- blocksize: int, blocksize in bytes (0 for automatic)
229
"""
230
231
def get_blocksize():
232
"""
233
Get current blocksize setting.
234
235
Returns:
236
int: Current blocksize (0 means automatic)
237
"""
238
239
def set_releasegil(gilstate):
240
"""
241
Set whether to release Python GIL during operations.
242
243
Parameters:
244
- gilstate: bool, True to release GIL during compression/decompression
245
246
Returns:
247
bool: Previous GIL release state
248
"""
249
```
250
251
### Utility Functions
252
253
System detection, resource management, and version information functions.
254
255
```python { .api }
256
def detect_number_of_cores():
257
"""
258
Detect number of CPU cores in system.
259
260
Returns:
261
int: Number of cores detected
262
"""
263
264
def free_resources():
265
"""
266
Free memory temporaries and thread resources.
267
268
Returns:
269
None
270
"""
271
272
def print_versions():
273
"""
274
Print versions of blosc and all dependencies.
275
276
Returns:
277
None
278
"""
279
```
280
281
### Compressor Information Functions
282
283
Functions to query available compressors and their properties.
284
285
```python { .api }
286
def compressor_list():
287
"""
288
Get list of available compressors.
289
290
Returns:
291
list: List of compressor names
292
"""
293
294
def code_to_name(code):
295
"""
296
Convert compressor code to name.
297
298
Parameters:
299
- code: int, compressor code
300
301
Returns:
302
str: Compressor name
303
"""
304
305
def name_to_code(name):
306
"""
307
Convert compressor name to code.
308
309
Parameters:
310
- name: str, compressor name
311
312
Returns:
313
int: Compressor code
314
"""
315
316
def clib_info(cname):
317
"""
318
Get compression library information.
319
320
Parameters:
321
- cname: str, compressor name
322
323
Returns:
324
tuple: (library_name, version)
325
"""
326
```
327
328
### Testing Function
329
330
```python { .api }
331
def test():
332
"""
333
Run blosc test suite.
334
335
Returns:
336
None
337
"""
338
```
339
340
### Low-Level Functions
341
342
Functions for initializing and cleaning up Blosc resources (called automatically):
343
344
```python { .api }
345
def init():
346
"""
347
Initialize Blosc library.
348
349
Returns:
350
None
351
352
Note: Called automatically on package import
353
"""
354
355
def destroy():
356
"""
357
Destroy Blosc resources and cleanup.
358
359
Returns:
360
None
361
362
Note: Called automatically on program exit
363
"""
364
```
365
366
## Constants
367
368
### Version Information
369
370
```python { .api }
371
__version__: str # Python package version
372
VERSION_STRING: str # Blosc C library version
373
VERSION_DATE: str # Blosc C library date
374
blosclib_version: str # Combined version string
375
```
376
377
### Size Limits
378
379
```python { .api }
380
MAX_BUFFERSIZE: int # Maximum buffer size for compression
381
MAX_THREADS: int # Maximum number of threads
382
MAX_TYPESIZE: int # Maximum type size (255)
383
```
384
385
### Shuffle Filters
386
387
```python { .api }
388
NOSHUFFLE: int # No shuffle filter (0)
389
SHUFFLE: int # Byte shuffle filter (1)
390
BITSHUFFLE: int # Bit shuffle filter (2)
391
```
392
393
### Legacy Constants
394
395
Backward compatibility constants with BLOSC_ prefix:
396
397
```python { .api }
398
BLOSC_VERSION_STRING: str # Alias for VERSION_STRING
399
BLOSC_VERSION_DATE: str # Alias for VERSION_DATE
400
BLOSC_MAX_BUFFERSIZE: int # Alias for MAX_BUFFERSIZE
401
BLOSC_MAX_THREADS: int # Alias for MAX_THREADS
402
BLOSC_MAX_TYPESIZE: int # Alias for MAX_TYPESIZE
403
```
404
405
## Runtime Variables
406
407
Current state variables updated by configuration functions:
408
409
```python { .api }
410
nthreads: int # Current number of threads in use
411
ncores: int # Number of cores detected on system
412
cnames: list # List of available compressor names
413
cname2clib: dict # Map compressor names to libraries
414
clib_versions: dict # Map libraries to versions
415
filters: dict # Map shuffle constants to string names
416
```
417
418
## Error Handling
419
420
Common exceptions raised by blosc functions:
421
422
- **TypeError**: Raised when input doesn't support buffer protocol or address not int
423
- **ValueError**: Raised when parameters are out of valid ranges:
424
- `clevel` not in 0-9 range
425
- `typesize` not in 1-MAX_TYPESIZE range
426
- `cname` not in available compressors
427
- `shuffle` not NOSHUFFLE, SHUFFLE, or BITSHUFFLE
428
- `nthreads` exceeds MAX_THREADS
429
- Buffer size exceeds MAX_BUFFERSIZE
430
431
## Performance Notes
432
433
- **Shuffle filters**: SHUFFLE works best for integer data, BITSHUFFLE for floating-point
434
- **Compressor selection**: 'lz4' for speed, 'zstd' for compression ratio, 'blosclz' for balance
435
- **Threading**: Optimal thread count often slightly below CPU core count
436
- **Block size**: Automatic sizing (0) usually optimal, manual sizing for expert use
437
- **GIL release**: Beneficial for large chunks with ThreadPool, small penalty for small blocks
438
- **Type size**: Should match actual data type size for optimal shuffle performance