or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

build-system.mdcommon-data.mdcontainers.mddata-utils.mdindex.mdio-backends.mdquery.mdspecification.mdterm-sets.mdutils.mdvalidation.md
tile.json

tessl/pypi-hdmf

A hierarchical data modeling framework for modern science data standards

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/hdmf@4.1.x

To install, run

npx @tessl/cli install tessl/pypi-hdmf@4.1.0

index.mddocs/

HDMF

The Hierarchical Data Modeling Framework (HDMF) is a Python package for working with hierarchical data. It provides APIs for specifying data models, reading and writing data to different storage backends (including HDF5 and Zarr), and representing data with Python objects. HDMF serves as the foundational technology for neuroscience data standards like NWB (Neurodata Without Borders) and provides comprehensive infrastructure for creating, validating, and managing complex scientific datasets.

Package Information

  • Package Name: hdmf
  • Language: Python
  • Installation: pip install hdmf
  • Documentation: https://hdmf.readthedocs.io

Core Imports

import hdmf

Common imports for working with containers and data:

from hdmf import Container, Data, HERDManager
from hdmf import docval, getargs
from hdmf import HDMFDataset

For HDF5 I/O operations:

from hdmf.backends.hdf5 import HDF5IO, H5DataIO

For common data structures:

from hdmf.common import DynamicTable, VectorData, VectorIndex

For specifications and validation:

from hdmf.spec import GroupSpec, DatasetSpec, SpecCatalog
from hdmf.validate import ValidatorMap

For data utilities:

from hdmf.data_utils import DataChunkIterator, DataIO

Basic Usage

import hdmf
from hdmf import Container, Data, docval
from hdmf.backends.hdf5 import HDF5IO
import numpy as np

# Create a simple data container
data_array = np.random.randn(100, 50)
data_container = Data(name='my_data', data=data_array)

# Create a container to hold the data
container = Container(name='my_container')
container.set_data_io(data_container)

# Write to HDF5 file
with HDF5IO('example.h5', mode='w') as io:
    io.write(container)

# Read back from HDF5 file
with HDF5IO('example.h5', mode='r') as io:
    read_container = io.read()
    print(f"Container name: {read_container.name}")
    print(f"Data shape: {read_container.data.shape}")

Architecture

HDMF follows a specification-driven architecture with several key components:

  • Container System: Hierarchical containers (Container, Data) that organize and hold data with metadata
  • Specification System: Schema definitions that describe data structure and validation rules
  • Build System: Converts between container objects and storage builders for different backends
  • I/O Backends: Pluggable storage backends (HDF5, Zarr) for reading/writing data
  • Validation System: Comprehensive validation against specifications and schemas
  • Type System: Dynamic type registration and validation with ontology support

This design enables HDMF to serve as both a standalone framework and the foundation for domain-specific standards like NWB, providing strong typing, metadata preservation, and cross-platform compatibility.

Capabilities

Container System

Core container classes for organizing hierarchical data structures with metadata, parent-child relationships, and data management capabilities.

class Container:
    def __init__(self, name: str): ...
    def set_data_io(self, data_io): ...
    def get_ancestor(self, neurodata_type: str = None) -> 'AbstractContainer': ...

class Data(Container):
    def __init__(self, name: str, data): ...
    def append(self, arg): ...
    def extend(self, arg): ...
    def get(self): ...

class HERDManager:
    def __init__(self): ...
    def link_resources(self, container: Container, resources: dict): ...
    def get_linked_resources(self, container: Container) -> dict: ...

Container System

Utilities and Validation

Decorators and utilities for parameter validation, argument handling, and type checking throughout the HDMF ecosystem.

def docval(*args, **kwargs):
    """Decorator for parameter validation and documentation."""

def getargs(arg_names, kwargs: dict):
    """Retrieve specified arguments from dictionary."""

def check_type(value, type_, name: str = None) -> bool:
    """Check if value matches expected type."""

def is_ragged(data) -> bool:
    """Test if array-like data is ragged."""

Utilities

I/O Backends

Reading and writing data to different storage formats with comprehensive backend support for HDF5, Zarr, and extensible I/O system.

class HDF5IO:
    def __init__(self, path: str, mode: str = 'r', **kwargs): ...
    def write(self, container, **kwargs): ...
    def read(self, **kwargs) -> Container: ...
    def close(self): ...

class H5DataIO:
    def __init__(self, data, **kwargs): ...
    @property
    def data(self): ...
    @property
    def io_settings(self) -> dict: ...

I/O Backends

Specification System

Schema definition and management for data models, including namespace catalogs, specification readers/writers, and validation rules.

class SpecCatalog:
    def __init__(self): ...
    def register_spec(self, spec, source_file: str = None): ...
    def get_spec(self, neurodata_type: str) -> 'BaseStorageSpec': ...

class GroupSpec:
    def __init__(self, doc: str, name: str = None, **kwargs): ...

class DatasetSpec:
    def __init__(self, doc: str, name: str = None, **kwargs): ...

Specification System

Build System

Converting containers to storage representations and managing type mappings between specifications and Python classes.

class BuildManager:
    def __init__(self, type_map: 'TypeMap'): ...
    def build(self, container, source: str = None, **kwargs) -> 'Builder': ...

class TypeMap:
    def __init__(self, namespaces: 'NamespaceCatalog'): ...
    def register_container_type(self, namespace: str, data_type: str, container_cls): ...

Build System

Common Data Structures

Pre-built data structures for scientific data including dynamic tables, vector data, sparse matrices, and multi-container systems.

class DynamicTable(Container):
    def __init__(self, name: str, description: str, **kwargs): ...
    def add_row(self, **kwargs): ...
    def to_dataframe(self): ...

class VectorData(Data):
    def __init__(self, name: str, description: str, data, **kwargs): ...

class CSRMatrix(Container):
    def __init__(self, data, indices, indptr, shape: tuple, **kwargs): ...

Common Data Structures

Query System

Querying and filtering capabilities for datasets and containers with reference resolution and advanced data access patterns.

class HDMFDataset:
    def __getitem__(self, key): ...
    def append(self, data): ...

class ContainerResolver:
    def __init__(self, type_map: 'TypeMap', container: Container): ...

Query System

Term Sets and Ontologies

Integration with ontologies and controlled vocabularies through term sets, type configuration, and semantic validation.

class TermSet:
    def __init__(self, term_schema_path: str = None, **kwargs): ...
    def validate(self, value): ...

class TermSetWrapper:
    def __init__(self, value, field: str, termset: TermSet, **kwargs): ...

class TypeConfigurator:
    @staticmethod
    def get_config(): ...
    @staticmethod
    def load_type_config(config_path: str): ...

Term Sets

Validation System

Comprehensive validation of data against specifications with detailed error reporting and schema compliance checking.

class ValidatorMap:
    def __init__(self): ...
    def register_validator(self, neurodata_type: str, validator): ...

class Validator:
    def __init__(self, spec): ...
    def validate(self, builder): ...

Validation

Data Utilities

Essential utilities for handling large datasets, chunk iterators, and I/O configurations with efficient memory management and streaming operations.

class DataChunkIterator:
    def __init__(self, data, **kwargs): ...
    def __next__(self): ...

class DataIO:
    def __init__(self, data, **kwargs): ...

def append_data(data, new_data): ...
def extend_data(data, extension_data): ...

Data Utilities

Testing Utilities

Test case classes and utilities for testing HDMF extensions and applications with support for HDF5 round-trip testing.

class TestCase:
    def setUp(self): ...
    def tearDown(self): ...

class H5RoundTripMixin:
    def test_roundtrip(self): ...

def remove_test_file(filename: str): ...

Testing utilities are available from hdmf.testing for building test suites.