or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

array-creation.mdcodecs.mdconfiguration.mdcore-classes.mddata-access.mddata-io.mdgroup-management.mdindex.mdstorage-backends.md
tile.json

tessl/pypi-zarr

An implementation of chunked, compressed, N-dimensional arrays for Python

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/zarr@3.1.x

To install, run

npx @tessl/cli install tessl/pypi-zarr@3.1.0

index.mddocs/

Zarr

Zarr is a comprehensive Python library that provides an implementation of chunked, compressed, N-dimensional arrays designed specifically for parallel computing and large-scale data storage. The library offers advanced features including the ability to create N-dimensional arrays with any NumPy dtype, chunk arrays along any dimension for optimized performance, compress and filter chunks using any NumCodecs codec, and store arrays flexibly across various backends including memory, disk, zip files, and cloud storage like S3.

Zarr excels in concurrent operations, supporting both parallel reading and writing from multiple threads or processes, and provides hierarchical organization of arrays through groups. The library is particularly valuable for scientific computing, data analysis, and applications requiring efficient storage and access of large multidimensional datasets.

Package Information

  • Package Name: zarr
  • Language: Python
  • Installation: pip install zarr
  • Version: 3.1.2
  • Python Requirements: >=3.11

Core Imports

import zarr

Common imports for array operations:

from zarr import Array, Group
from zarr import open, create, save, load

Basic Usage

import zarr
import numpy as np

# Create a zarr array from numpy array
data = np.random.random((1000, 1000))
z = zarr.from_array(data, chunks=(100, 100))

# Create an array directly 
z = zarr.zeros((10, 10), chunks=(5, 5), dtype='float64')

# Store and retrieve data
z[:5, :5] = 1.0
print(z[:5, :5])

# Save to storage
zarr.save('data.zarr', z)

# Load from storage
loaded = zarr.load('data.zarr')

# Create a group with multiple arrays
grp = zarr.group()
grp.create_array('temperature', shape=(365, 100, 100), chunks=(1, 50, 50))
grp.create_array('humidity', shape=(365, 100, 100), chunks=(1, 50, 50))

Architecture

Zarr follows a hierarchical data model with several key components:

  • Arrays: N-dimensional chunked arrays with compression and filtering capabilities
  • Groups: Hierarchical containers for organizing arrays and sub-groups
  • Stores: Storage backends (memory, filesystem, cloud, etc.) that persist array data and metadata
  • Codecs: Compression and encoding algorithms for optimizing storage and I/O
  • Chunks: Fixed-size blocks that arrays are divided into for parallel processing

This architecture enables efficient storage and retrieval of large datasets while supporting concurrent access patterns essential for high-performance computing and cloud-native applications.

Capabilities

Array Creation and Initialization

Functions for creating zarr arrays with various initialization patterns. These provide the primary entry points for creating new arrays with different fill patterns and from existing data sources.

def array(data, **kwargs) -> Array: ...
def create(shape, **kwargs) -> Array: ...
def empty(shape, **kwargs) -> Array: ...
def zeros(shape, **kwargs) -> Array: ...
def ones(shape, **kwargs) -> Array: ...
def full(shape, fill_value, **kwargs) -> Array: ...
def from_array(a, **kwargs) -> Array: ...

Array Creation

Array and Group Access

Functions for opening and accessing existing zarr arrays and groups from various storage backends. These functions provide flexible ways to load existing data structures.

def open(store, **kwargs) -> Array | Group: ...
def open_array(store, **kwargs) -> Array: ...
def open_group(store, **kwargs) -> Group: ...
def open_consolidated(store, **kwargs) -> Group: ...
def open_like(a, path, **kwargs) -> Array: ...

Data Access

Data I/O Operations

High-level functions for saving and loading zarr data structures to and from storage. These provide convenient interfaces for persistence operations.

def save(file, *args, **kwargs) -> None: ...
def save_array(store, arr, **kwargs) -> None: ...
def save_group(store, **kwargs) -> None: ...
def load(store, **kwargs) -> Any: ...

Data I/O

Group Management

Functions for creating and managing hierarchical group structures. Groups provide organizational capabilities for complex datasets with multiple related arrays.

def group(store=None, **kwargs) -> Group: ...
def create_group(store, **kwargs) -> Group: ...
def create_hierarchy(path, **kwargs) -> None: ...

Group Management

Core Classes

The fundamental array and group classes that form the core of zarr's object-oriented interface. These classes provide comprehensive functionality for array manipulation and hierarchical data organization.

class Array:
    shape: tuple[int, ...]
    dtype: np.dtype
    chunks: tuple[int, ...]
    attrs: dict
    def __getitem__(self, selection): ...
    def __setitem__(self, selection, value): ...
    def resize(self, *args): ...

class Group:
    attrs: dict
    def create_array(self, name, **kwargs) -> Array: ...
    def create_group(self, name, **kwargs) -> Group: ...
    def __getitem__(self, key): ...
    def __setitem__(self, key, value): ...

Core Classes

Storage Backends

Storage backend classes for persisting zarr data across different storage systems. These provide the flexibility to use zarr with various storage infrastructures.

class MemoryStore: ...
class LocalStore: ...
class ZipStore: ...
class FsspecStore: ...
class ObjectStore: ...

Storage Backends

Compression and Codecs

Codec classes for data compression, transformation, and encoding. These enable efficient storage through various compression algorithms and data transformations.

class BloscCodec: ...
class GzipCodec: ...
class ZstdCodec: ...
class BytesCodec: ...
class TransposeCodec: ...
class ShardingCodec: ...

Codecs

Configuration and Utilities

Configuration system and utility functions for zarr settings, metadata management, and debugging operations.

config: Config
def consolidate_metadata(store, **kwargs) -> Group: ...
def copy(source, dest, **kwargs) -> tuple[int, int, int]: ...
def tree(grp, **kwargs) -> Any: ...
def print_debug_info() -> None: ...

Configuration