An implementation of chunked, compressed, N-dimensional arrays for Python
npx @tessl/cli install tessl/pypi-zarr@3.1.0Zarr is a comprehensive Python library that provides an implementation of chunked, compressed, N-dimensional arrays designed specifically for parallel computing and large-scale data storage. The library offers advanced features including the ability to create N-dimensional arrays with any NumPy dtype, chunk arrays along any dimension for optimized performance, compress and filter chunks using any NumCodecs codec, and store arrays flexibly across various backends including memory, disk, zip files, and cloud storage like S3.
Zarr excels in concurrent operations, supporting both parallel reading and writing from multiple threads or processes, and provides hierarchical organization of arrays through groups. The library is particularly valuable for scientific computing, data analysis, and applications requiring efficient storage and access of large multidimensional datasets.
pip install zarrimport zarrCommon imports for array operations:
from zarr import Array, Group
from zarr import open, create, save, loadimport zarr
import numpy as np
# Create a zarr array from numpy array
data = np.random.random((1000, 1000))
z = zarr.from_array(data, chunks=(100, 100))
# Create an array directly
z = zarr.zeros((10, 10), chunks=(5, 5), dtype='float64')
# Store and retrieve data
z[:5, :5] = 1.0
print(z[:5, :5])
# Save to storage
zarr.save('data.zarr', z)
# Load from storage
loaded = zarr.load('data.zarr')
# Create a group with multiple arrays
grp = zarr.group()
grp.create_array('temperature', shape=(365, 100, 100), chunks=(1, 50, 50))
grp.create_array('humidity', shape=(365, 100, 100), chunks=(1, 50, 50))Zarr follows a hierarchical data model with several key components:
This architecture enables efficient storage and retrieval of large datasets while supporting concurrent access patterns essential for high-performance computing and cloud-native applications.
Functions for creating zarr arrays with various initialization patterns. These provide the primary entry points for creating new arrays with different fill patterns and from existing data sources.
def array(data, **kwargs) -> Array: ...
def create(shape, **kwargs) -> Array: ...
def empty(shape, **kwargs) -> Array: ...
def zeros(shape, **kwargs) -> Array: ...
def ones(shape, **kwargs) -> Array: ...
def full(shape, fill_value, **kwargs) -> Array: ...
def from_array(a, **kwargs) -> Array: ...Functions for opening and accessing existing zarr arrays and groups from various storage backends. These functions provide flexible ways to load existing data structures.
def open(store, **kwargs) -> Array | Group: ...
def open_array(store, **kwargs) -> Array: ...
def open_group(store, **kwargs) -> Group: ...
def open_consolidated(store, **kwargs) -> Group: ...
def open_like(a, path, **kwargs) -> Array: ...High-level functions for saving and loading zarr data structures to and from storage. These provide convenient interfaces for persistence operations.
def save(file, *args, **kwargs) -> None: ...
def save_array(store, arr, **kwargs) -> None: ...
def save_group(store, **kwargs) -> None: ...
def load(store, **kwargs) -> Any: ...Functions for creating and managing hierarchical group structures. Groups provide organizational capabilities for complex datasets with multiple related arrays.
def group(store=None, **kwargs) -> Group: ...
def create_group(store, **kwargs) -> Group: ...
def create_hierarchy(path, **kwargs) -> None: ...The fundamental array and group classes that form the core of zarr's object-oriented interface. These classes provide comprehensive functionality for array manipulation and hierarchical data organization.
class Array:
shape: tuple[int, ...]
dtype: np.dtype
chunks: tuple[int, ...]
attrs: dict
def __getitem__(self, selection): ...
def __setitem__(self, selection, value): ...
def resize(self, *args): ...
class Group:
attrs: dict
def create_array(self, name, **kwargs) -> Array: ...
def create_group(self, name, **kwargs) -> Group: ...
def __getitem__(self, key): ...
def __setitem__(self, key, value): ...Storage backend classes for persisting zarr data across different storage systems. These provide the flexibility to use zarr with various storage infrastructures.
class MemoryStore: ...
class LocalStore: ...
class ZipStore: ...
class FsspecStore: ...
class ObjectStore: ...Codec classes for data compression, transformation, and encoding. These enable efficient storage through various compression algorithms and data transformations.
class BloscCodec: ...
class GzipCodec: ...
class ZstdCodec: ...
class BytesCodec: ...
class TransposeCodec: ...
class ShardingCodec: ...Configuration system and utility functions for zarr settings, metadata management, and debugging operations.
config: Config
def consolidate_metadata(store, **kwargs) -> Group: ...
def copy(source, dest, **kwargs) -> tuple[int, int, int]: ...
def tree(grp, **kwargs) -> Any: ...
def print_debug_info() -> None: ...