0
# I/O Backends
1
2
HDMF provides a pluggable I/O system supporting multiple storage backends including HDF5 and Zarr. The I/O system handles reading and writing hierarchical data structures with support for compression, chunking, and efficient data access patterns.
3
4
## Capabilities
5
6
### Base I/O Interface
7
8
Abstract base class defining the interface for all HDMF I/O backends.
9
10
```python { .api }
11
class HDMFIO:
12
"""
13
Abstract base class for HDMF I/O operations.
14
15
Provides the interface contract for all storage backend implementations.
16
"""
17
18
def __init__(self, path: str, mode: str = 'r', **kwargs):
19
"""
20
Initialize I/O backend.
21
22
Args:
23
path: Path to the file or storage location
24
mode: File access mode ('r', 'w', 'a', 'r+')
25
"""
26
27
def write(self, container, **kwargs):
28
"""
29
Write container to storage backend.
30
31
Args:
32
container: Container object to write
33
"""
34
35
def read(self, **kwargs):
36
"""
37
Read data from storage backend.
38
39
Returns:
40
Container object with loaded data
41
"""
42
43
def close(self):
44
"""Close the I/O backend and release resources."""
45
46
def __enter__(self):
47
"""Context manager entry."""
48
49
def __exit__(self, exc_type, exc_val, exc_tb):
50
"""Context manager exit with cleanup."""
51
```
52
53
### HDF5 I/O Backend
54
55
Primary I/O backend for reading and writing HDF5 files with full HDMF feature support.
56
57
```python { .api }
58
class HDF5IO(HDMFIO):
59
"""
60
HDF5 I/O backend for reading and writing HDMF data to HDF5 files.
61
62
Supports all HDMF features including hierarchical containers, metadata,
63
compression, chunking, and cross-platform compatibility.
64
"""
65
66
def __init__(self, path: str, mode: str = 'r', manager=None, **kwargs):
67
"""
68
Initialize HDF5 I/O.
69
70
Args:
71
path: Path to HDF5 file
72
mode: File access mode ('r', 'w', 'a', 'r+')
73
manager: Build manager for container conversion
74
**kwargs: Additional HDF5 file options
75
"""
76
77
def write(self, container, **kwargs):
78
"""
79
Write container to HDF5 file.
80
81
Args:
82
container: Container object to write
83
**kwargs: Write options including:
84
- cache_spec: Whether to cache specification (default: True)
85
- exhaust_dci: Whether to exhaust data chunk iterators
86
- link_data: Whether to link external data
87
"""
88
89
def read(self, **kwargs):
90
"""
91
Read container from HDF5 file.
92
93
Args:
94
**kwargs: Read options
95
96
Returns:
97
Container object loaded from file
98
"""
99
100
def export(self, src_io, container, **kwargs):
101
"""
102
Export container from another I/O source to this HDF5 file.
103
104
Args:
105
src_io: Source I/O object
106
container: Container to export
107
"""
108
109
def close(self):
110
"""Close HDF5 file and release resources."""
111
112
@property
113
def file(self):
114
"""Access to underlying h5py File object."""
115
```
116
117
### HDF5 Data I/O Configuration
118
119
Configuration wrapper for customizing how data is written to HDF5 files.
120
121
```python { .api }
122
class H5DataIO:
123
"""
124
HDF5 data I/O configuration wrapper for controlling storage options.
125
126
Provides fine-grained control over compression, chunking, filtering,
127
and other HDF5 dataset creation properties.
128
"""
129
130
def __init__(self, data, **kwargs):
131
"""
132
Initialize H5DataIO wrapper.
133
134
Args:
135
data: Data to be written
136
**kwargs: HDF5 dataset creation options:
137
- compression: Compression filter ('gzip', 'lzf', 'szip')
138
- compression_opts: Compression level (0-9 for gzip)
139
- shuffle: Enable shuffle filter for better compression
140
- fletcher32: Enable Fletcher32 checksum filter
141
- chunks: Chunk shape for datasets
142
- maxshape: Maximum shape for resizable datasets
143
- fillvalue: Fill value for uninitialized data
144
- track_times: Track dataset creation/modification times
145
"""
146
147
@property
148
def data(self):
149
"""Access to wrapped data."""
150
151
@property
152
def io_settings(self) -> dict:
153
"""Dictionary of I/O settings for this data."""
154
```
155
156
### HDF5 Specification I/O
157
158
Specialized classes for reading and writing HDMF specifications to HDF5 files.
159
160
```python { .api }
161
class H5SpecWriter:
162
"""
163
Writer for HDMF specifications in HDF5 format.
164
165
Handles storage of namespace and specification information within HDF5 files.
166
"""
167
168
def __init__(self, io: HDF5IO):
169
"""
170
Initialize specification writer.
171
172
Args:
173
io: HDF5IO object for file access
174
"""
175
176
def write_spec(self, spec_catalog, spec_namespace):
177
"""
178
Write specification catalog and namespace to HDF5 file.
179
180
Args:
181
spec_catalog: Specification catalog to write
182
spec_namespace: Namespace information
183
"""
184
185
class H5SpecReader:
186
"""
187
Reader for HDMF specifications from HDF5 format.
188
189
Loads namespace and specification information from HDF5 files.
190
"""
191
192
def __init__(self, io: HDF5IO):
193
"""
194
Initialize specification reader.
195
196
Args:
197
io: HDF5IO object for file access
198
"""
199
200
def read_spec(self) -> tuple:
201
"""
202
Read specification from HDF5 file.
203
204
Returns:
205
Tuple of (spec_catalog, spec_namespace)
206
"""
207
```
208
209
### HDF5 Utilities and Tools
210
211
Utility functions and tools for working with HDF5 files and datasets.
212
213
```python { .api }
214
class H5Dataset:
215
"""
216
Wrapper for HDF5 datasets providing enhanced functionality.
217
218
Adds HDMF-specific features to h5py dataset objects including
219
lazy loading, data transformation, and metadata handling.
220
"""
221
222
def __init__(self, dataset, io: HDF5IO, **kwargs):
223
"""
224
Initialize H5Dataset wrapper.
225
226
Args:
227
dataset: h5py dataset object
228
io: Parent HDF5IO object
229
"""
230
231
def __getitem__(self, key):
232
"""Get data slice from dataset."""
233
234
def __setitem__(self, key, value):
235
"""Set data slice in dataset."""
236
237
@property
238
def shape(self) -> tuple:
239
"""Shape of the dataset."""
240
241
@property
242
def dtype(self):
243
"""Data type of the dataset."""
244
245
@property
246
def size(self) -> int:
247
"""Total number of elements in dataset."""
248
249
# HDF5 utility functions
250
def get_h5_version() -> str:
251
"""
252
Get HDF5 library version.
253
254
Returns:
255
HDF5 version string
256
"""
257
258
def check_h5_version(min_version: str = None) -> bool:
259
"""
260
Check if HDF5 version meets minimum requirements.
261
262
Args:
263
min_version: Minimum required version
264
265
Returns:
266
True if version is sufficient
267
"""
268
```
269
270
## Usage Examples
271
272
### Basic HDF5 I/O Operations
273
274
```python
275
from hdmf.backends.hdf5 import HDF5IO, H5DataIO
276
from hdmf import Container, Data
277
import numpy as np
278
279
# Create sample data
280
data_array = np.random.randn(1000, 100)
281
data_container = Data(name='neural_data', data=data_array)
282
283
container = Container(name='experiment')
284
container.add_child(data_container)
285
286
# Write to HDF5 file
287
with HDF5IO('experiment.h5', mode='w') as io:
288
io.write(container)
289
290
# Read from HDF5 file
291
with HDF5IO('experiment.h5', mode='r') as io:
292
read_container = io.read()
293
print(f"Container: {read_container.name}")
294
print(f"Data shape: {read_container.neural_data.shape}")
295
```
296
297
### Advanced HDF5 Data Configuration
298
299
```python
300
from hdmf.backends.hdf5 import H5DataIO
301
import numpy as np
302
303
# Create large dataset with compression
304
large_data = np.random.randn(10000, 1000)
305
306
# Configure compression and chunking
307
compressed_data = H5DataIO(
308
data=large_data,
309
compression='gzip',
310
compression_opts=9, # Maximum compression
311
shuffle=True, # Better compression for numeric data
312
fletcher32=True, # Checksums for data integrity
313
chunks=(1000, 100), # Chunk size for efficient access
314
maxshape=(None, 1000) # Allow resizing along first dimension
315
)
316
317
data_container = Data(name='compressed_data', data=compressed_data)
318
319
# Write with advanced options
320
with HDF5IO('compressed_experiment.h5', mode='w') as io:
321
io.write(container, cache_spec=True, exhaust_dci=False)
322
```
323
324
### Working with External Data Links
325
326
```python
327
from hdmf.backends.hdf5 import HDF5IO
328
from hdmf import Data
329
330
# Create external data reference
331
external_data = H5DataIO(
332
data='path/to/external/data.h5',
333
link_data=True # Link instead of copying
334
)
335
336
data_container = Data(name='external_data', data=external_data)
337
338
# Write with external links
339
with HDF5IO('main_file.h5', mode='w') as io:
340
io.write(container, link_data=True)
341
```
342
343
### Reading Subsets of Large Datasets
344
345
```python
346
from hdmf.backends.hdf5 import HDF5IO
347
348
# Open file in read mode
349
with HDF5IO('large_experiment.h5', mode='r') as io:
350
container = io.read()
351
352
# Access dataset without loading all data
353
dataset = container.neural_data.data
354
355
# Read specific slices
356
first_100_samples = dataset[:100, :]
357
specific_channels = dataset[:, [0, 5, 10]]
358
time_window = dataset[1000:2000, :]
359
360
print(f"Dataset shape: {dataset.shape}")
361
print(f"Slice shape: {first_100_samples.shape}")
362
```
363
364
### Appending Data to Existing Files
365
366
```python
367
from hdmf.backends.hdf5 import HDF5IO, H5DataIO
368
import numpy as np
369
370
# Initial data with resizable configuration
371
initial_data = H5DataIO(
372
data=np.random.randn(100, 50),
373
maxshape=(None, 50), # Allow growth along first dimension
374
chunks=(10, 50)
375
)
376
377
data_container = Data(name='growing_data', data=initial_data)
378
379
# Write initial data
380
with HDF5IO('growing_experiment.h5', mode='w') as io:
381
io.write(container)
382
383
# Append new data
384
with HDF5IO('growing_experiment.h5', mode='a') as io:
385
container = io.read()
386
new_data = np.random.randn(50, 50)
387
388
# Append to existing dataset
389
container.growing_data.append(new_data)
390
391
# Write updated container
392
io.write(container)
393
```
394
395
### Cross-Platform File Operations
396
397
```python
398
from hdmf.backends.hdf5 import HDF5IO
399
import os
400
401
def process_hdmf_file(input_path: str, output_path: str):
402
"""Process HDMF file across different platforms."""
403
404
# Read from any platform
405
with HDF5IO(input_path, mode='r') as src_io:
406
container = src_io.read()
407
408
# Process data
409
for child in container.children:
410
if hasattr(child, 'data'):
411
# Apply processing to data
412
processed_data = child.data * 1.5
413
child.data = processed_data
414
415
# Write to new location
416
with HDF5IO(output_path, mode='w') as dst_io:
417
dst_io.write(container, cache_spec=True)
418
419
print(f"Processed file written to: {output_path}")
420
421
# Cross-platform usage
422
if os.name == 'nt': # Windows
423
input_file = r'C:\data\experiment.h5'
424
output_file = r'C:\processed\experiment_processed.h5'
425
else: # Unix-like systems
426
input_file = '/data/experiment.h5'
427
output_file = '/processed/experiment_processed.h5'
428
429
process_hdmf_file(input_file, output_file)
430
```