or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

build-system.mdcommon-data.mdcontainers.mddata-utils.mdindex.mdio-backends.mdquery.mdspecification.mdterm-sets.mdutils.mdvalidation.md

index.mddocs/

0

# HDMF

1

2

The Hierarchical Data Modeling Framework (HDMF) is a Python package for working with hierarchical data. It provides APIs for specifying data models, reading and writing data to different storage backends (including HDF5 and Zarr), and representing data with Python objects. HDMF serves as the foundational technology for neuroscience data standards like NWB (Neurodata Without Borders) and provides comprehensive infrastructure for creating, validating, and managing complex scientific datasets.

3

4

## Package Information

5

6

- **Package Name**: hdmf

7

- **Language**: Python

8

- **Installation**: `pip install hdmf`

9

- **Documentation**: https://hdmf.readthedocs.io

10

11

## Core Imports

12

13

```python

14

import hdmf

15

```

16

17

Common imports for working with containers and data:

18

19

```python

20

from hdmf import Container, Data, HERDManager

21

from hdmf import docval, getargs

22

from hdmf import HDMFDataset

23

```

24

25

For HDF5 I/O operations:

26

27

```python

28

from hdmf.backends.hdf5 import HDF5IO, H5DataIO

29

```

30

31

For common data structures:

32

33

```python

34

from hdmf.common import DynamicTable, VectorData, VectorIndex

35

```

36

37

For specifications and validation:

38

39

```python

40

from hdmf.spec import GroupSpec, DatasetSpec, SpecCatalog

41

from hdmf.validate import ValidatorMap

42

```

43

44

For data utilities:

45

46

```python

47

from hdmf.data_utils import DataChunkIterator, DataIO

48

```

49

50

## Basic Usage

51

52

```python

53

import hdmf

54

from hdmf import Container, Data, docval

55

from hdmf.backends.hdf5 import HDF5IO

56

import numpy as np

57

58

# Create a simple data container

59

data_array = np.random.randn(100, 50)

60

data_container = Data(name='my_data', data=data_array)

61

62

# Create a container to hold the data

63

container = Container(name='my_container')

64

container.set_data_io(data_container)

65

66

# Write to HDF5 file

67

with HDF5IO('example.h5', mode='w') as io:

68

io.write(container)

69

70

# Read back from HDF5 file

71

with HDF5IO('example.h5', mode='r') as io:

72

read_container = io.read()

73

print(f"Container name: {read_container.name}")

74

print(f"Data shape: {read_container.data.shape}")

75

```

76

77

## Architecture

78

79

HDMF follows a specification-driven architecture with several key components:

80

81

- **Container System**: Hierarchical containers (`Container`, `Data`) that organize and hold data with metadata

82

- **Specification System**: Schema definitions that describe data structure and validation rules

83

- **Build System**: Converts between container objects and storage builders for different backends

84

- **I/O Backends**: Pluggable storage backends (HDF5, Zarr) for reading/writing data

85

- **Validation System**: Comprehensive validation against specifications and schemas

86

- **Type System**: Dynamic type registration and validation with ontology support

87

88

This design enables HDMF to serve as both a standalone framework and the foundation for domain-specific standards like NWB, providing strong typing, metadata preservation, and cross-platform compatibility.

89

90

## Capabilities

91

92

### Container System

93

94

Core container classes for organizing hierarchical data structures with metadata, parent-child relationships, and data management capabilities.

95

96

```python { .api }

97

class Container:

98

def __init__(self, name: str): ...

99

def set_data_io(self, data_io): ...

100

def get_ancestor(self, neurodata_type: str = None) -> 'AbstractContainer': ...

101

102

class Data(Container):

103

def __init__(self, name: str, data): ...

104

def append(self, arg): ...

105

def extend(self, arg): ...

106

def get(self): ...

107

108

class HERDManager:

109

def __init__(self): ...

110

def link_resources(self, container: Container, resources: dict): ...

111

def get_linked_resources(self, container: Container) -> dict: ...

112

```

113

114

[Container System](./containers.md)

115

116

### Utilities and Validation

117

118

Decorators and utilities for parameter validation, argument handling, and type checking throughout the HDMF ecosystem.

119

120

```python { .api }

121

def docval(*args, **kwargs):

122

"""Decorator for parameter validation and documentation."""

123

124

def getargs(arg_names, kwargs: dict):

125

"""Retrieve specified arguments from dictionary."""

126

127

def check_type(value, type_, name: str = None) -> bool:

128

"""Check if value matches expected type."""

129

130

def is_ragged(data) -> bool:

131

"""Test if array-like data is ragged."""

132

```

133

134

[Utilities](./utils.md)

135

136

### I/O Backends

137

138

Reading and writing data to different storage formats with comprehensive backend support for HDF5, Zarr, and extensible I/O system.

139

140

```python { .api }

141

class HDF5IO:

142

def __init__(self, path: str, mode: str = 'r', **kwargs): ...

143

def write(self, container, **kwargs): ...

144

def read(self, **kwargs) -> Container: ...

145

def close(self): ...

146

147

class H5DataIO:

148

def __init__(self, data, **kwargs): ...

149

@property

150

def data(self): ...

151

@property

152

def io_settings(self) -> dict: ...

153

```

154

155

[I/O Backends](./io-backends.md)

156

157

### Specification System

158

159

Schema definition and management for data models, including namespace catalogs, specification readers/writers, and validation rules.

160

161

```python { .api }

162

class SpecCatalog:

163

def __init__(self): ...

164

def register_spec(self, spec, source_file: str = None): ...

165

def get_spec(self, neurodata_type: str) -> 'BaseStorageSpec': ...

166

167

class GroupSpec:

168

def __init__(self, doc: str, name: str = None, **kwargs): ...

169

170

class DatasetSpec:

171

def __init__(self, doc: str, name: str = None, **kwargs): ...

172

```

173

174

[Specification System](./specification.md)

175

176

### Build System

177

178

Converting containers to storage representations and managing type mappings between specifications and Python classes.

179

180

```python { .api }

181

class BuildManager:

182

def __init__(self, type_map: 'TypeMap'): ...

183

def build(self, container, source: str = None, **kwargs) -> 'Builder': ...

184

185

class TypeMap:

186

def __init__(self, namespaces: 'NamespaceCatalog'): ...

187

def register_container_type(self, namespace: str, data_type: str, container_cls): ...

188

```

189

190

[Build System](./build-system.md)

191

192

### Common Data Structures

193

194

Pre-built data structures for scientific data including dynamic tables, vector data, sparse matrices, and multi-container systems.

195

196

```python { .api }

197

class DynamicTable(Container):

198

def __init__(self, name: str, description: str, **kwargs): ...

199

def add_row(self, **kwargs): ...

200

def to_dataframe(self): ...

201

202

class VectorData(Data):

203

def __init__(self, name: str, description: str, data, **kwargs): ...

204

205

class CSRMatrix(Container):

206

def __init__(self, data, indices, indptr, shape: tuple, **kwargs): ...

207

```

208

209

[Common Data Structures](./common-data.md)

210

211

### Query System

212

213

Querying and filtering capabilities for datasets and containers with reference resolution and advanced data access patterns.

214

215

```python { .api }

216

class HDMFDataset:

217

def __getitem__(self, key): ...

218

def append(self, data): ...

219

220

class ContainerResolver:

221

def __init__(self, type_map: 'TypeMap', container: Container): ...

222

```

223

224

[Query System](./query.md)

225

226

### Term Sets and Ontologies

227

228

Integration with ontologies and controlled vocabularies through term sets, type configuration, and semantic validation.

229

230

```python { .api }

231

class TermSet:

232

def __init__(self, term_schema_path: str = None, **kwargs): ...

233

def validate(self, value): ...

234

235

class TermSetWrapper:

236

def __init__(self, value, field: str, termset: TermSet, **kwargs): ...

237

238

class TypeConfigurator:

239

@staticmethod

240

def get_config(): ...

241

@staticmethod

242

def load_type_config(config_path: str): ...

243

```

244

245

[Term Sets](./term-sets.md)

246

247

### Validation System

248

249

Comprehensive validation of data against specifications with detailed error reporting and schema compliance checking.

250

251

```python { .api }

252

class ValidatorMap:

253

def __init__(self): ...

254

def register_validator(self, neurodata_type: str, validator): ...

255

256

class Validator:

257

def __init__(self, spec): ...

258

def validate(self, builder): ...

259

```

260

261

[Validation](./validation.md)

262

263

### Data Utilities

264

265

Essential utilities for handling large datasets, chunk iterators, and I/O configurations with efficient memory management and streaming operations.

266

267

```python { .api }

268

class DataChunkIterator:

269

def __init__(self, data, **kwargs): ...

270

def __next__(self): ...

271

272

class DataIO:

273

def __init__(self, data, **kwargs): ...

274

275

def append_data(data, new_data): ...

276

def extend_data(data, extension_data): ...

277

```

278

279

[Data Utilities](./data-utils.md)

280

281

### Testing Utilities

282

283

Test case classes and utilities for testing HDMF extensions and applications with support for HDF5 round-trip testing.

284

285

```python { .api }

286

class TestCase:

287

def setUp(self): ...

288

def tearDown(self): ...

289

290

class H5RoundTripMixin:

291

def test_roundtrip(self): ...

292

293

def remove_test_file(filename: str): ...

294

```

295

296

Testing utilities are available from `hdmf.testing` for building test suites.