or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-export.mddataset-reading.mdhigh-performance-writing.mdindex.mdmemory-mapping-utils.md

dataset-reading.mddocs/

0

# HDF5 Dataset Reading

1

2

Memory-mapped reading of HDF5 files with support for multiple formats and zero-copy access patterns. The dataset readers provide efficient access to large datasets without loading entire files into memory.

3

4

## Capabilities

5

6

### Standard HDF5 Dataset Reading

7

8

The main class for reading HDF5 files in vaex format with memory mapping for optimal performance.

9

10

```python { .api }

11

class Hdf5MemoryMapped(DatasetMemoryMapped):

12

"""

13

Implements the vaex hdf5 file format with memory mapping support.

14

15

Provides zero-copy access to HDF5 datasets through memory mapping,

16

supporting both read and write operations with automatic format detection.

17

"""

18

def __init__(self, path, write=False, fs_options={}, fs=None, nommap=None, group=None, _fingerprint=None):

19

"""

20

Initialize HDF5 memory-mapped dataset.

21

22

Parameters:

23

- path: Path to HDF5 file

24

- write: Enable write mode (default: False)

25

- fs_options: Filesystem options for remote storage

26

- fs: Filesystem implementation (for remote storage)

27

- nommap: Force disable memory mapping

28

- group: HDF5 group path to read from

29

- _fingerprint: Cached fingerprint for testing

30

"""

31

```

32

33

#### Class Methods

34

35

```python { .api }

36

@classmethod

37

def create(cls, path, N, column_names, dtypes=None, write=True):

38

"""

39

Create a new empty HDF5 file with specified columns.

40

41

Parameters:

42

- path: Output file path

43

- N: Number of rows to allocate

44

- column_names: List of column names

45

- dtypes: List of numpy dtypes (default: float64 for all)

46

- write: Enable write mode

47

48

Returns:

49

Hdf5MemoryMapped instance of the created file

50

51

Raises:

52

ValueError: If N is 0 (cannot export empty table)

53

"""

54

55

@classmethod

56

def quick_test(cls, path, fs_options={}, fs=None):

57

"""

58

Quick test if file has HDF5 extension.

59

60

Parameters:

61

- path: File path to test

62

- fs_options: Filesystem options

63

- fs: Filesystem implementation

64

65

Returns:

66

bool: True if path ends with .hdf5 or .h5

67

"""

68

69

@classmethod

70

def can_open(cls, path, fs_options={}, fs=None, group=None, **kwargs):

71

"""

72

Check if file can be opened as vaex HDF5 format.

73

74

Parameters:

75

- path: File path to check

76

- fs_options: Filesystem options

77

- fs: Filesystem implementation

78

- group: Specific HDF5 group to check

79

80

Returns:

81

bool: True if file can be opened

82

"""

83

84

@classmethod

85

def get_options(cls, path):

86

"""Get available options for opening file."""

87

88

@classmethod

89

def option_to_args(cls, option):

90

"""Convert option to constructor arguments."""

91

```

92

93

#### Instance Methods

94

95

```python { .api }

96

def write_meta(self):

97

"""

98

Write metadata (units, descriptions, UCDs) as HDF5 attributes.

99

100

UCDs, descriptions and units are written as attributes in the HDF5 file,

101

instead of a separate file as the default Dataset.write_meta().

102

"""

103

104

def close(self):

105

"""Close the HDF5 file and clean up resources."""

106

```

107

108

### AMUSE Format Support

109

110

Reader for HDF5 files created by the AMUSE astrophysics framework.

111

112

```python { .api }

113

class AmuseHdf5MemoryMapped(Hdf5MemoryMapped):

114

"""

115

Implements reading Amuse HDF5 files from amusecode.org.

116

117

AMUSE (Astrophysical Multipurpose Software Environment) creates HDF5 files

118

with specific structure containing particle data and metadata.

119

"""

120

def __init__(self, path, write=False, fs_options={}, fs=None):

121

"""

122

Initialize AMUSE HDF5 dataset reader.

123

124

Parameters:

125

- path: Path to AMUSE HDF5 file

126

- write: Enable write mode (default: False)

127

- fs_options: Filesystem options

128

- fs: Filesystem implementation

129

"""

130

131

@classmethod

132

def can_open(cls, path, *args, **kwargs):

133

"""

134

Check if file is AMUSE HDF5 format.

135

136

Parameters:

137

- path: File path to check

138

139

Returns:

140

bool: True if file contains 'particles' group

141

"""

142

```

143

144

### Gadget2 Format Support

145

146

Reader for HDF5 files created by the Gadget2 N-body simulation code.

147

148

```python { .api }

149

class Hdf5MemoryMappedGadget(DatasetMemoryMapped):

150

"""

151

Implements reading Gadget2 HDF5 files.

152

153

Gadget2 is a cosmological N-body/SPH simulation code that outputs

154

HDF5 files with particle data organized by particle type.

155

"""

156

def __init__(self, path, particle_name=None, particle_type=None, fs_options={}, fs=None):

157

"""

158

Initialize Gadget2 HDF5 dataset reader.

159

160

Parameters:

161

- path: Path to Gadget2 HDF5 file (can include #<particle_type>)

162

- particle_name: Name of particle type ("gas", "halo", "disk", "bulge", "stars", "dm")

163

- particle_type: Numeric particle type (0-5)

164

- fs_options: Filesystem options

165

- fs: Filesystem implementation

166

167

Note: Either particle_name, particle_type, or #<type> in path must be specified

168

"""

169

170

@classmethod

171

def can_open(cls, path, fs_options={}, fs=None, *args, **kwargs):

172

"""

173

Check if file is Gadget2 HDF5 format with specified particle type.

174

175

Parameters:

176

- path: File path (may include #<particle_type>)

177

- fs_options: Filesystem options

178

- fs: Filesystem implementation

179

180

Returns:

181

bool: True if file contains the specified particle type data

182

"""

183

184

@classmethod

185

def get_options(cls, path):

186

"""Get available options for Gadget2 file."""

187

188

@classmethod

189

def option_to_args(cls, option):

190

"""Convert option to constructor arguments."""

191

```

192

193

## Usage Examples

194

195

### Reading Standard HDF5 Files

196

197

```python

198

import vaex

199

from vaex.hdf5.dataset import Hdf5MemoryMapped

200

201

# Automatic detection via vaex.open

202

df = vaex.open('data.hdf5')

203

204

# Direct instantiation

205

dataset = Hdf5MemoryMapped('data.hdf5')

206

df = vaex.from_dataset(dataset)

207

208

# Reading specific group

209

dataset = Hdf5MemoryMapped('data.hdf5', group='/table')

210

211

# Reading from remote storage

212

dataset = Hdf5MemoryMapped('s3://bucket/data.hdf5',

213

fs_options={'anon': True})

214

```

215

216

### Creating New HDF5 Files

217

218

```python

219

from vaex.hdf5.dataset import Hdf5MemoryMapped

220

import numpy as np

221

222

# Create empty file with specified structure

223

dataset = Hdf5MemoryMapped.create(

224

'new_data.hdf5',

225

N=1000,

226

column_names=['x', 'y', 'z', 'velocity'],

227

dtypes=[np.float64, np.float64, np.float64, np.float32],

228

write=True

229

)

230

231

# Populate with data

232

df = vaex.from_dataset(dataset)

233

df.x[:] = np.random.random(1000)

234

df.y[:] = np.random.random(1000)

235

# ... continue with data population

236

```

237

238

### Reading AMUSE Files

239

240

```python

241

# AMUSE files auto-detected by vaex.open

242

df = vaex.open('amuse_simulation.hdf5')

243

244

# Direct instantiation

245

from vaex.hdf5.dataset import AmuseHdf5MemoryMapped

246

dataset = AmuseHdf5MemoryMapped('amuse_simulation.hdf5')

247

df = vaex.from_dataset(dataset)

248

```

249

250

### Reading Gadget2 Files

251

252

```python

253

# Using path with particle type

254

df_gas = vaex.open('snapshot_001.hdf5#0') # Gas particles

255

df_dm = vaex.open('snapshot_001.hdf5#5') # Dark matter particles

256

257

# Using particle name

258

from vaex.hdf5.dataset import Hdf5MemoryMappedGadget

259

dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_name='gas')

260

df = vaex.from_dataset(dataset)

261

262

# Using particle type number

263

dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_type=0)

264

```

265

266

## Constants

267

268

```python { .api }

269

gadget_particle_names = ["gas", "halo", "disk", "bulge", "stars", "dm"]

270

```

271

272

Mapping of Gadget2 particle type names to their numeric indices (0-5).

273

274

## Error Handling

275

276

All dataset readers may raise:

277

278

- `FileNotFoundError`: If the specified file doesn't exist

279

- `OSError`: For file permission or I/O errors

280

- `h5py.H5Error`: For HDF5 format errors

281

- `ValueError`: For invalid parameters or unsupported data formats

282

- `KeyError`: If specified groups or datasets don't exist in the file