An implementation of chunked, compressed, N-dimensional arrays for Python
npx @tessl/cli install tessl/pypi-zarr@3.1.00
# Zarr
1
2
Zarr is a comprehensive Python library that provides an implementation of chunked, compressed, N-dimensional arrays designed specifically for parallel computing and large-scale data storage. The library offers advanced features including the ability to create N-dimensional arrays with any NumPy dtype, chunk arrays along any dimension for optimized performance, compress and filter chunks using any NumCodecs codec, and store arrays flexibly across various backends including memory, disk, zip files, and cloud storage like S3.
3
4
Zarr excels in concurrent operations, supporting both parallel reading and writing from multiple threads or processes, and provides hierarchical organization of arrays through groups. The library is particularly valuable for scientific computing, data analysis, and applications requiring efficient storage and access of large multidimensional datasets.
5
6
## Package Information
7
8
- **Package Name**: zarr
9
- **Language**: Python
10
- **Installation**: `pip install zarr`
11
- **Version**: 3.1.2
12
- **Python Requirements**: >=3.11
13
14
## Core Imports
15
16
```python
17
import zarr
18
```
19
20
Common imports for array operations:
21
22
```python
23
from zarr import Array, Group
24
from zarr import open, create, save, load
25
```
26
27
## Basic Usage
28
29
```python
30
import zarr
31
import numpy as np
32
33
# Create a zarr array from numpy array
34
data = np.random.random((1000, 1000))
35
z = zarr.from_array(data, chunks=(100, 100))
36
37
# Create an array directly
38
z = zarr.zeros((10, 10), chunks=(5, 5), dtype='float64')
39
40
# Store and retrieve data
41
z[:5, :5] = 1.0
42
print(z[:5, :5])
43
44
# Save to storage
45
zarr.save('data.zarr', z)
46
47
# Load from storage
48
loaded = zarr.load('data.zarr')
49
50
# Create a group with multiple arrays
51
grp = zarr.group()
52
grp.create_array('temperature', shape=(365, 100, 100), chunks=(1, 50, 50))
53
grp.create_array('humidity', shape=(365, 100, 100), chunks=(1, 50, 50))
54
```
55
56
## Architecture
57
58
Zarr follows a hierarchical data model with several key components:
59
60
- **Arrays**: N-dimensional chunked arrays with compression and filtering capabilities
61
- **Groups**: Hierarchical containers for organizing arrays and sub-groups
62
- **Stores**: Storage backends (memory, filesystem, cloud, etc.) that persist array data and metadata
63
- **Codecs**: Compression and encoding algorithms for optimizing storage and I/O
64
- **Chunks**: Fixed-size blocks that arrays are divided into for parallel processing
65
66
This architecture enables efficient storage and retrieval of large datasets while supporting concurrent access patterns essential for high-performance computing and cloud-native applications.
67
68
## Capabilities
69
70
### Array Creation and Initialization
71
72
Functions for creating zarr arrays with various initialization patterns. These provide the primary entry points for creating new arrays with different fill patterns and from existing data sources.
73
74
```python { .api }
75
def array(data, **kwargs) -> Array: ...
76
def create(shape, **kwargs) -> Array: ...
77
def empty(shape, **kwargs) -> Array: ...
78
def zeros(shape, **kwargs) -> Array: ...
79
def ones(shape, **kwargs) -> Array: ...
80
def full(shape, fill_value, **kwargs) -> Array: ...
81
def from_array(a, **kwargs) -> Array: ...
82
```
83
84
[Array Creation](./array-creation.md)
85
86
### Array and Group Access
87
88
Functions for opening and accessing existing zarr arrays and groups from various storage backends. These functions provide flexible ways to load existing data structures.
89
90
```python { .api }
91
def open(store, **kwargs) -> Array | Group: ...
92
def open_array(store, **kwargs) -> Array: ...
93
def open_group(store, **kwargs) -> Group: ...
94
def open_consolidated(store, **kwargs) -> Group: ...
95
def open_like(a, path, **kwargs) -> Array: ...
96
```
97
98
[Data Access](./data-access.md)
99
100
### Data I/O Operations
101
102
High-level functions for saving and loading zarr data structures to and from storage. These provide convenient interfaces for persistence operations.
103
104
```python { .api }
105
def save(file, *args, **kwargs) -> None: ...
106
def save_array(store, arr, **kwargs) -> None: ...
107
def save_group(store, **kwargs) -> None: ...
108
def load(store, **kwargs) -> Any: ...
109
```
110
111
[Data I/O](./data-io.md)
112
113
### Group Management
114
115
Functions for creating and managing hierarchical group structures. Groups provide organizational capabilities for complex datasets with multiple related arrays.
116
117
```python { .api }
118
def group(store=None, **kwargs) -> Group: ...
119
def create_group(store, **kwargs) -> Group: ...
120
def create_hierarchy(path, **kwargs) -> None: ...
121
```
122
123
[Group Management](./group-management.md)
124
125
### Core Classes
126
127
The fundamental array and group classes that form the core of zarr's object-oriented interface. These classes provide comprehensive functionality for array manipulation and hierarchical data organization.
128
129
```python { .api }
130
class Array:
131
shape: tuple[int, ...]
132
dtype: np.dtype
133
chunks: tuple[int, ...]
134
attrs: dict
135
def __getitem__(self, selection): ...
136
def __setitem__(self, selection, value): ...
137
def resize(self, *args): ...
138
139
class Group:
140
attrs: dict
141
def create_array(self, name, **kwargs) -> Array: ...
142
def create_group(self, name, **kwargs) -> Group: ...
143
def __getitem__(self, key): ...
144
def __setitem__(self, key, value): ...
145
```
146
147
[Core Classes](./core-classes.md)
148
149
### Storage Backends
150
151
Storage backend classes for persisting zarr data across different storage systems. These provide the flexibility to use zarr with various storage infrastructures.
152
153
```python { .api }
154
class MemoryStore: ...
155
class LocalStore: ...
156
class ZipStore: ...
157
class FsspecStore: ...
158
class ObjectStore: ...
159
```
160
161
[Storage Backends](./storage-backends.md)
162
163
### Compression and Codecs
164
165
Codec classes for data compression, transformation, and encoding. These enable efficient storage through various compression algorithms and data transformations.
166
167
```python { .api }
168
class BloscCodec: ...
169
class GzipCodec: ...
170
class ZstdCodec: ...
171
class BytesCodec: ...
172
class TransposeCodec: ...
173
class ShardingCodec: ...
174
```
175
176
[Codecs](./codecs.md)
177
178
### Configuration and Utilities
179
180
Configuration system and utility functions for zarr settings, metadata management, and debugging operations.
181
182
```python { .api }
183
config: Config
184
def consolidate_metadata(store, **kwargs) -> Group: ...
185
def copy(source, dest, **kwargs) -> tuple[int, int, int]: ...
186
def tree(grp, **kwargs) -> Any: ...
187
def print_debug_info() -> None: ...
188
```
189
190
[Configuration](./configuration.md)