0
# HDF5 Dataset Reading
1
2
Memory-mapped reading of HDF5 files with support for multiple formats and zero-copy access patterns. The dataset readers provide efficient access to large datasets without loading entire files into memory.
3
4
## Capabilities
5
6
### Standard HDF5 Dataset Reading
7
8
The main class for reading HDF5 files in vaex format with memory mapping for optimal performance.
9
10
```python { .api }
11
class Hdf5MemoryMapped(DatasetMemoryMapped):
12
"""
13
Implements the vaex hdf5 file format with memory mapping support.
14
15
Provides zero-copy access to HDF5 datasets through memory mapping,
16
supporting both read and write operations with automatic format detection.
17
"""
18
def __init__(self, path, write=False, fs_options={}, fs=None, nommap=None, group=None, _fingerprint=None):
19
"""
20
Initialize HDF5 memory-mapped dataset.
21
22
Parameters:
23
- path: Path to HDF5 file
24
- write: Enable write mode (default: False)
25
- fs_options: Filesystem options for remote storage
26
- fs: Filesystem implementation (for remote storage)
27
- nommap: Force disable memory mapping
28
- group: HDF5 group path to read from
29
- _fingerprint: Cached fingerprint for testing
30
"""
31
```
32
33
#### Class Methods
34
35
```python { .api }
36
@classmethod
37
def create(cls, path, N, column_names, dtypes=None, write=True):
38
"""
39
Create a new empty HDF5 file with specified columns.
40
41
Parameters:
42
- path: Output file path
43
- N: Number of rows to allocate
44
- column_names: List of column names
45
- dtypes: List of numpy dtypes (default: float64 for all)
46
- write: Enable write mode
47
48
Returns:
49
Hdf5MemoryMapped instance of the created file
50
51
Raises:
52
ValueError: If N is 0 (cannot export empty table)
53
"""
54
55
@classmethod
56
def quick_test(cls, path, fs_options={}, fs=None):
57
"""
58
Quick test if file has HDF5 extension.
59
60
Parameters:
61
- path: File path to test
62
- fs_options: Filesystem options
63
- fs: Filesystem implementation
64
65
Returns:
66
bool: True if path ends with .hdf5 or .h5
67
"""
68
69
@classmethod
70
def can_open(cls, path, fs_options={}, fs=None, group=None, **kwargs):
71
"""
72
Check if file can be opened as vaex HDF5 format.
73
74
Parameters:
75
- path: File path to check
76
- fs_options: Filesystem options
77
- fs: Filesystem implementation
78
- group: Specific HDF5 group to check
79
80
Returns:
81
bool: True if file can be opened
82
"""
83
84
@classmethod
85
def get_options(cls, path):
86
"""Get available options for opening file."""
87
88
@classmethod
89
def option_to_args(cls, option):
90
"""Convert option to constructor arguments."""
91
```
92
93
#### Instance Methods
94
95
```python { .api }
96
def write_meta(self):
97
"""
98
Write metadata (units, descriptions, UCDs) as HDF5 attributes.
99
100
UCDs, descriptions and units are written as attributes in the HDF5 file,
101
instead of a separate file as the default Dataset.write_meta().
102
"""
103
104
def close(self):
105
"""Close the HDF5 file and clean up resources."""
106
```
107
108
### AMUSE Format Support
109
110
Reader for HDF5 files created by the AMUSE astrophysics framework.
111
112
```python { .api }
113
class AmuseHdf5MemoryMapped(Hdf5MemoryMapped):
114
"""
115
Implements reading Amuse HDF5 files from amusecode.org.
116
117
AMUSE (Astrophysical Multipurpose Software Environment) creates HDF5 files
118
with specific structure containing particle data and metadata.
119
"""
120
def __init__(self, path, write=False, fs_options={}, fs=None):
121
"""
122
Initialize AMUSE HDF5 dataset reader.
123
124
Parameters:
125
- path: Path to AMUSE HDF5 file
126
- write: Enable write mode (default: False)
127
- fs_options: Filesystem options
128
- fs: Filesystem implementation
129
"""
130
131
@classmethod
132
def can_open(cls, path, *args, **kwargs):
133
"""
134
Check if file is AMUSE HDF5 format.
135
136
Parameters:
137
- path: File path to check
138
139
Returns:
140
bool: True if file contains 'particles' group
141
"""
142
```
143
144
### Gadget2 Format Support
145
146
Reader for HDF5 files created by the Gadget2 N-body simulation code.
147
148
```python { .api }
149
class Hdf5MemoryMappedGadget(DatasetMemoryMapped):
150
"""
151
Implements reading Gadget2 HDF5 files.
152
153
Gadget2 is a cosmological N-body/SPH simulation code that outputs
154
HDF5 files with particle data organized by particle type.
155
"""
156
def __init__(self, path, particle_name=None, particle_type=None, fs_options={}, fs=None):
157
"""
158
Initialize Gadget2 HDF5 dataset reader.
159
160
Parameters:
161
- path: Path to Gadget2 HDF5 file (can include #<particle_type>)
162
- particle_name: Name of particle type ("gas", "halo", "disk", "bulge", "stars", "dm")
163
- particle_type: Numeric particle type (0-5)
164
- fs_options: Filesystem options
165
- fs: Filesystem implementation
166
167
Note: Either particle_name, particle_type, or #<type> in path must be specified
168
"""
169
170
@classmethod
171
def can_open(cls, path, fs_options={}, fs=None, *args, **kwargs):
172
"""
173
Check if file is Gadget2 HDF5 format with specified particle type.
174
175
Parameters:
176
- path: File path (may include #<particle_type>)
177
- fs_options: Filesystem options
178
- fs: Filesystem implementation
179
180
Returns:
181
bool: True if file contains the specified particle type data
182
"""
183
184
@classmethod
185
def get_options(cls, path):
186
"""Get available options for Gadget2 file."""
187
188
@classmethod
189
def option_to_args(cls, option):
190
"""Convert option to constructor arguments."""
191
```
192
193
## Usage Examples
194
195
### Reading Standard HDF5 Files
196
197
```python
198
import vaex
199
from vaex.hdf5.dataset import Hdf5MemoryMapped
200
201
# Automatic detection via vaex.open
202
df = vaex.open('data.hdf5')
203
204
# Direct instantiation
205
dataset = Hdf5MemoryMapped('data.hdf5')
206
df = vaex.from_dataset(dataset)
207
208
# Reading specific group
209
dataset = Hdf5MemoryMapped('data.hdf5', group='/table')
210
211
# Reading from remote storage
212
dataset = Hdf5MemoryMapped('s3://bucket/data.hdf5',
213
fs_options={'anon': True})
214
```
215
216
### Creating New HDF5 Files
217
218
```python
219
from vaex.hdf5.dataset import Hdf5MemoryMapped
220
import numpy as np
221
222
# Create empty file with specified structure
223
dataset = Hdf5MemoryMapped.create(
224
'new_data.hdf5',
225
N=1000,
226
column_names=['x', 'y', 'z', 'velocity'],
227
dtypes=[np.float64, np.float64, np.float64, np.float32],
228
write=True
229
)
230
231
# Populate with data
232
df = vaex.from_dataset(dataset)
233
df.x[:] = np.random.random(1000)
234
df.y[:] = np.random.random(1000)
235
# ... continue with data population
236
```
237
238
### Reading AMUSE Files
239
240
```python
241
# AMUSE files auto-detected by vaex.open
242
df = vaex.open('amuse_simulation.hdf5')
243
244
# Direct instantiation
245
from vaex.hdf5.dataset import AmuseHdf5MemoryMapped
246
dataset = AmuseHdf5MemoryMapped('amuse_simulation.hdf5')
247
df = vaex.from_dataset(dataset)
248
```
249
250
### Reading Gadget2 Files
251
252
```python
253
# Using path with particle type
254
df_gas = vaex.open('snapshot_001.hdf5#0') # Gas particles
255
df_dm = vaex.open('snapshot_001.hdf5#5') # Dark matter particles
256
257
# Using particle name
258
from vaex.hdf5.dataset import Hdf5MemoryMappedGadget
259
dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_name='gas')
260
df = vaex.from_dataset(dataset)
261
262
# Using particle type number
263
dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_type=0)
264
```
265
266
## Constants
267
268
```python { .api }
269
gadget_particle_names = ["gas", "halo", "disk", "bulge", "stars", "dm"]
270
```
271
272
Mapping of Gadget2 particle type names to their numeric indices (0-5).
273
274
## Error Handling
275
276
All dataset readers may raise:
277
278
- `FileNotFoundError`: If the specified file doesn't exist
279
- `OSError`: For file permission or I/O errors
280
- `h5py.H5Error`: For HDF5 format errors
281
- `ValueError`: For invalid parameters or unsupported data formats
282
- `KeyError`: If specified groups or datasets don't exist in the file