A hierarchical data modeling framework for modern science data standards
npx @tessl/cli install tessl/pypi-hdmf@4.1.00
# HDMF
1
2
The Hierarchical Data Modeling Framework (HDMF) is a Python package for working with hierarchical data. It provides APIs for specifying data models, reading and writing data to different storage backends (including HDF5 and Zarr), and representing data with Python objects. HDMF serves as the foundational technology for neuroscience data standards like NWB (Neurodata Without Borders) and provides comprehensive infrastructure for creating, validating, and managing complex scientific datasets.
3
4
## Package Information
5
6
- **Package Name**: hdmf
7
- **Language**: Python
8
- **Installation**: `pip install hdmf`
9
- **Documentation**: https://hdmf.readthedocs.io
10
11
## Core Imports
12
13
```python
14
import hdmf
15
```
16
17
Common imports for working with containers and data:
18
19
```python
20
from hdmf import Container, Data, HERDManager
21
from hdmf import docval, getargs
22
from hdmf import HDMFDataset
23
```
24
25
For HDF5 I/O operations:
26
27
```python
28
from hdmf.backends.hdf5 import HDF5IO, H5DataIO
29
```
30
31
For common data structures:
32
33
```python
34
from hdmf.common import DynamicTable, VectorData, VectorIndex
35
```
36
37
For specifications and validation:
38
39
```python
40
from hdmf.spec import GroupSpec, DatasetSpec, SpecCatalog
41
from hdmf.validate import ValidatorMap
42
```
43
44
For data utilities:
45
46
```python
47
from hdmf.data_utils import DataChunkIterator, DataIO
48
```
49
50
## Basic Usage
51
52
```python
53
import hdmf
54
from hdmf import Container, Data, docval
55
from hdmf.backends.hdf5 import HDF5IO
56
import numpy as np
57
58
# Create a simple data container
59
data_array = np.random.randn(100, 50)
60
data_container = Data(name='my_data', data=data_array)
61
62
# Create a container to hold the data
63
container = Container(name='my_container')
64
container.set_data_io(data_container)
65
66
# Write to HDF5 file
67
with HDF5IO('example.h5', mode='w') as io:
68
io.write(container)
69
70
# Read back from HDF5 file
71
with HDF5IO('example.h5', mode='r') as io:
72
read_container = io.read()
73
print(f"Container name: {read_container.name}")
74
print(f"Data shape: {read_container.data.shape}")
75
```
76
77
## Architecture
78
79
HDMF follows a specification-driven architecture with several key components:
80
81
- **Container System**: Hierarchical containers (`Container`, `Data`) that organize and hold data with metadata
82
- **Specification System**: Schema definitions that describe data structure and validation rules
83
- **Build System**: Converts between container objects and storage builders for different backends
84
- **I/O Backends**: Pluggable storage backends (HDF5, Zarr) for reading/writing data
85
- **Validation System**: Comprehensive validation against specifications and schemas
86
- **Type System**: Dynamic type registration and validation with ontology support
87
88
This design enables HDMF to serve as both a standalone framework and the foundation for domain-specific standards like NWB, providing strong typing, metadata preservation, and cross-platform compatibility.
89
90
## Capabilities
91
92
### Container System
93
94
Core container classes for organizing hierarchical data structures with metadata, parent-child relationships, and data management capabilities.
95
96
```python { .api }
97
class Container:
98
def __init__(self, name: str): ...
99
def set_data_io(self, data_io): ...
100
def get_ancestor(self, neurodata_type: str = None) -> 'AbstractContainer': ...
101
102
class Data(Container):
103
def __init__(self, name: str, data): ...
104
def append(self, arg): ...
105
def extend(self, arg): ...
106
def get(self): ...
107
108
class HERDManager:
109
def __init__(self): ...
110
def link_resources(self, container: Container, resources: dict): ...
111
def get_linked_resources(self, container: Container) -> dict: ...
112
```
113
114
[Container System](./containers.md)
115
116
### Utilities and Validation
117
118
Decorators and utilities for parameter validation, argument handling, and type checking throughout the HDMF ecosystem.
119
120
```python { .api }
121
def docval(*args, **kwargs):
122
"""Decorator for parameter validation and documentation."""
123
124
def getargs(arg_names, kwargs: dict):
125
"""Retrieve specified arguments from dictionary."""
126
127
def check_type(value, type_, name: str = None) -> bool:
128
"""Check if value matches expected type."""
129
130
def is_ragged(data) -> bool:
131
"""Test if array-like data is ragged."""
132
```
133
134
[Utilities](./utils.md)
135
136
### I/O Backends
137
138
Reading and writing data to different storage formats with comprehensive backend support for HDF5, Zarr, and extensible I/O system.
139
140
```python { .api }
141
class HDF5IO:
142
def __init__(self, path: str, mode: str = 'r', **kwargs): ...
143
def write(self, container, **kwargs): ...
144
def read(self, **kwargs) -> Container: ...
145
def close(self): ...
146
147
class H5DataIO:
148
def __init__(self, data, **kwargs): ...
149
@property
150
def data(self): ...
151
@property
152
def io_settings(self) -> dict: ...
153
```
154
155
[I/O Backends](./io-backends.md)
156
157
### Specification System
158
159
Schema definition and management for data models, including namespace catalogs, specification readers/writers, and validation rules.
160
161
```python { .api }
162
class SpecCatalog:
163
def __init__(self): ...
164
def register_spec(self, spec, source_file: str = None): ...
165
def get_spec(self, neurodata_type: str) -> 'BaseStorageSpec': ...
166
167
class GroupSpec:
168
def __init__(self, doc: str, name: str = None, **kwargs): ...
169
170
class DatasetSpec:
171
def __init__(self, doc: str, name: str = None, **kwargs): ...
172
```
173
174
[Specification System](./specification.md)
175
176
### Build System
177
178
Converting containers to storage representations and managing type mappings between specifications and Python classes.
179
180
```python { .api }
181
class BuildManager:
182
def __init__(self, type_map: 'TypeMap'): ...
183
def build(self, container, source: str = None, **kwargs) -> 'Builder': ...
184
185
class TypeMap:
186
def __init__(self, namespaces: 'NamespaceCatalog'): ...
187
def register_container_type(self, namespace: str, data_type: str, container_cls): ...
188
```
189
190
[Build System](./build-system.md)
191
192
### Common Data Structures
193
194
Pre-built data structures for scientific data including dynamic tables, vector data, sparse matrices, and multi-container systems.
195
196
```python { .api }
197
class DynamicTable(Container):
198
def __init__(self, name: str, description: str, **kwargs): ...
199
def add_row(self, **kwargs): ...
200
def to_dataframe(self): ...
201
202
class VectorData(Data):
203
def __init__(self, name: str, description: str, data, **kwargs): ...
204
205
class CSRMatrix(Container):
206
def __init__(self, data, indices, indptr, shape: tuple, **kwargs): ...
207
```
208
209
[Common Data Structures](./common-data.md)
210
211
### Query System
212
213
Querying and filtering capabilities for datasets and containers with reference resolution and advanced data access patterns.
214
215
```python { .api }
216
class HDMFDataset:
217
def __getitem__(self, key): ...
218
def append(self, data): ...
219
220
class ContainerResolver:
221
def __init__(self, type_map: 'TypeMap', container: Container): ...
222
```
223
224
[Query System](./query.md)
225
226
### Term Sets and Ontologies
227
228
Integration with ontologies and controlled vocabularies through term sets, type configuration, and semantic validation.
229
230
```python { .api }
231
class TermSet:
232
def __init__(self, term_schema_path: str = None, **kwargs): ...
233
def validate(self, value): ...
234
235
class TermSetWrapper:
236
def __init__(self, value, field: str, termset: TermSet, **kwargs): ...
237
238
class TypeConfigurator:
239
@staticmethod
240
def get_config(): ...
241
@staticmethod
242
def load_type_config(config_path: str): ...
243
```
244
245
[Term Sets](./term-sets.md)
246
247
### Validation System
248
249
Comprehensive validation of data against specifications with detailed error reporting and schema compliance checking.
250
251
```python { .api }
252
class ValidatorMap:
253
def __init__(self): ...
254
def register_validator(self, neurodata_type: str, validator): ...
255
256
class Validator:
257
def __init__(self, spec): ...
258
def validate(self, builder): ...
259
```
260
261
[Validation](./validation.md)
262
263
### Data Utilities
264
265
Essential utilities for handling large datasets, chunk iterators, and I/O configurations with efficient memory management and streaming operations.
266
267
```python { .api }
268
class DataChunkIterator:
269
def __init__(self, data, **kwargs): ...
270
def __next__(self): ...
271
272
class DataIO:
273
def __init__(self, data, **kwargs): ...
274
275
def append_data(data, new_data): ...
276
def extend_data(data, extension_data): ...
277
```
278
279
[Data Utilities](./data-utils.md)
280
281
### Testing Utilities
282
283
Test case classes and utilities for testing HDMF extensions and applications with support for HDF5 round-trip testing.
284
285
```python { .api }
286
class TestCase:
287
def setUp(self): ...
288
def tearDown(self): ...
289
290
class H5RoundTripMixin:
291
def test_roundtrip(self): ...
292
293
def remove_test_file(filename: str): ...
294
```
295
296
Testing utilities are available from `hdmf.testing` for building test suites.