Tessl Tile for pypi/vaex-hdf5@0.14.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-export.md dataset-reading.md high-performance-writing.md index.md memory-mapping-utils.md

dataset-reading.mddocs/

0
# HDF5 Dataset Reading
1

2
Memory-mapped reading of HDF5 files with support for multiple formats and zero-copy access patterns. The dataset readers provide efficient access to large datasets without loading entire files into memory.
3

4
## Capabilities
5

6
### Standard HDF5 Dataset Reading
7

8
The main class for reading HDF5 files in vaex format with memory mapping for optimal performance.
9

10
```python { .api }
11
class Hdf5MemoryMapped(DatasetMemoryMapped):
12
    """
13
    Implements the vaex hdf5 file format with memory mapping support.
14
    
15
    Provides zero-copy access to HDF5 datasets through memory mapping,
16
    supporting both read and write operations with automatic format detection.
17
    """
18
    def __init__(self, path, write=False, fs_options={}, fs=None, nommap=None, group=None, _fingerprint=None):
19
        """
20
        Initialize HDF5 memory-mapped dataset.
21
        
22
        Parameters:
23
        - path: Path to HDF5 file
24
        - write: Enable write mode (default: False)
25
        - fs_options: Filesystem options for remote storage
26
        - fs: Filesystem implementation (for remote storage)
27
        - nommap: Force disable memory mapping
28
        - group: HDF5 group path to read from
29
        - _fingerprint: Cached fingerprint for testing
30
        """
31
```
32

33
#### Class Methods
34

35
```python { .api }
36
@classmethod
37
def create(cls, path, N, column_names, dtypes=None, write=True):
38
    """
39
    Create a new empty HDF5 file with specified columns.
40
    
41
    Parameters:
42
    - path: Output file path
43
    - N: Number of rows to allocate
44
    - column_names: List of column names
45
    - dtypes: List of numpy dtypes (default: float64 for all)
46
    - write: Enable write mode
47
    
48
    Returns:
49
    Hdf5MemoryMapped instance of the created file
50
    
51
    Raises:
52
    ValueError: If N is 0 (cannot export empty table)
53
    """
54

55
@classmethod
56
def quick_test(cls, path, fs_options={}, fs=None):
57
    """
58
    Quick test if file has HDF5 extension.
59
    
60
    Parameters:
61
    - path: File path to test
62
    - fs_options: Filesystem options
63
    - fs: Filesystem implementation
64
    
65
    Returns:
66
    bool: True if path ends with .hdf5 or .h5
67
    """
68

69
@classmethod
70
def can_open(cls, path, fs_options={}, fs=None, group=None, **kwargs):
71
    """
72
    Check if file can be opened as vaex HDF5 format.
73
    
74
    Parameters:
75
    - path: File path to check
76
    - fs_options: Filesystem options
77
    - fs: Filesystem implementation  
78
    - group: Specific HDF5 group to check
79
    
80
    Returns:
81
    bool: True if file can be opened
82
    """
83

84
@classmethod
85
def get_options(cls, path):
86
    """Get available options for opening file."""
87
    
88
@classmethod  
89
def option_to_args(cls, option):
90
    """Convert option to constructor arguments."""
91
```
92

93
#### Instance Methods
94

95
```python { .api }
96
def write_meta(self):
97
    """
98
    Write metadata (units, descriptions, UCDs) as HDF5 attributes.
99
    
100
    UCDs, descriptions and units are written as attributes in the HDF5 file,
101
    instead of a separate file as the default Dataset.write_meta().
102
    """
103

104
def close(self):
105
    """Close the HDF5 file and clean up resources."""
106
```
107

108
### AMUSE Format Support
109

110
Reader for HDF5 files created by the AMUSE astrophysics framework.
111

112
```python { .api }
113
class AmuseHdf5MemoryMapped(Hdf5MemoryMapped):
114
    """
115
    Implements reading Amuse HDF5 files from amusecode.org.
116
    
117
    AMUSE (Astrophysical Multipurpose Software Environment) creates HDF5 files
118
    with specific structure containing particle data and metadata.
119
    """
120
    def __init__(self, path, write=False, fs_options={}, fs=None):
121
        """
122
        Initialize AMUSE HDF5 dataset reader.
123
        
124
        Parameters:
125
        - path: Path to AMUSE HDF5 file  
126
        - write: Enable write mode (default: False)
127
        - fs_options: Filesystem options
128
        - fs: Filesystem implementation
129
        """
130

131
    @classmethod
132
    def can_open(cls, path, *args, **kwargs):
133
        """
134
        Check if file is AMUSE HDF5 format.
135
        
136
        Parameters:
137
        - path: File path to check
138
        
139
        Returns:
140
        bool: True if file contains 'particles' group
141
        """
142
```
143

144
### Gadget2 Format Support
145

146
Reader for HDF5 files created by the Gadget2 N-body simulation code.
147

148
```python { .api }
149
class Hdf5MemoryMappedGadget(DatasetMemoryMapped):
150
    """
151
    Implements reading Gadget2 HDF5 files.
152
    
153
    Gadget2 is a cosmological N-body/SPH simulation code that outputs
154
    HDF5 files with particle data organized by particle type.
155
    """
156
    def __init__(self, path, particle_name=None, particle_type=None, fs_options={}, fs=None):
157
        """
158
        Initialize Gadget2 HDF5 dataset reader.
159
        
160
        Parameters:
161
        - path: Path to Gadget2 HDF5 file (can include #<particle_type>)
162
        - particle_name: Name of particle type ("gas", "halo", "disk", "bulge", "stars", "dm")
163
        - particle_type: Numeric particle type (0-5)
164
        - fs_options: Filesystem options
165
        - fs: Filesystem implementation
166
        
167
        Note: Either particle_name, particle_type, or #<type> in path must be specified
168
        """
169

170
    @classmethod
171
    def can_open(cls, path, fs_options={}, fs=None, *args, **kwargs):
172
        """
173
        Check if file is Gadget2 HDF5 format with specified particle type.
174
        
175
        Parameters:
176
        - path: File path (may include #<particle_type>)
177
        - fs_options: Filesystem options
178
        - fs: Filesystem implementation
179
        
180
        Returns:
181
        bool: True if file contains the specified particle type data
182
        """
183

184
    @classmethod
185
    def get_options(cls, path):
186
        """Get available options for Gadget2 file."""
187
        
188
    @classmethod
189
    def option_to_args(cls, option):
190
        """Convert option to constructor arguments."""
191
```
192

193
## Usage Examples
194

195
### Reading Standard HDF5 Files
196

197
```python
198
import vaex
199
from vaex.hdf5.dataset import Hdf5MemoryMapped
200

201
# Automatic detection via vaex.open
202
df = vaex.open('data.hdf5')
203

204
# Direct instantiation
205
dataset = Hdf5MemoryMapped('data.hdf5')
206
df = vaex.from_dataset(dataset)
207

208
# Reading specific group
209
dataset = Hdf5MemoryMapped('data.hdf5', group='/table')
210

211
# Reading from remote storage
212
dataset = Hdf5MemoryMapped('s3://bucket/data.hdf5', 
213
                          fs_options={'anon': True})
214
```
215

216
### Creating New HDF5 Files
217

218
```python
219
from vaex.hdf5.dataset import Hdf5MemoryMapped
220
import numpy as np
221

222
# Create empty file with specified structure
223
dataset = Hdf5MemoryMapped.create(
224
    'new_data.hdf5', 
225
    N=1000,
226
    column_names=['x', 'y', 'z', 'velocity'],
227
    dtypes=[np.float64, np.float64, np.float64, np.float32],
228
    write=True
229
)
230

231
# Populate with data
232
df = vaex.from_dataset(dataset)
233
df.x[:] = np.random.random(1000)
234
df.y[:] = np.random.random(1000)
235
# ... continue with data population
236
```
237

238
### Reading AMUSE Files
239

240
```python
241
# AMUSE files auto-detected by vaex.open
242
df = vaex.open('amuse_simulation.hdf5')
243

244
# Direct instantiation
245
from vaex.hdf5.dataset import AmuseHdf5MemoryMapped
246
dataset = AmuseHdf5MemoryMapped('amuse_simulation.hdf5')
247
df = vaex.from_dataset(dataset)
248
```
249

250
### Reading Gadget2 Files
251

252
```python
253
# Using path with particle type
254
df_gas = vaex.open('snapshot_001.hdf5#0')  # Gas particles
255
df_dm = vaex.open('snapshot_001.hdf5#5')   # Dark matter particles
256

257
# Using particle name
258
from vaex.hdf5.dataset import Hdf5MemoryMappedGadget
259
dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_name='gas')
260
df = vaex.from_dataset(dataset)
261

262
# Using particle type number
263
dataset = Hdf5MemoryMappedGadget('snapshot_001.hdf5', particle_type=0)
264
```
265

266
## Constants
267

268
```python { .api }
269
gadget_particle_names = ["gas", "halo", "disk", "bulge", "stars", "dm"]
270
```
271

272
Mapping of Gadget2 particle type names to their numeric indices (0-5).
273

274
## Error Handling
275

276
All dataset readers may raise:
277

278
- `FileNotFoundError`: If the specified file doesn't exist
279
- `OSError`: For file permission or I/O errors  
280
- `h5py.H5Error`: For HDF5 format errors
281
- `ValueError`: For invalid parameters or unsupported data formats
282
- `KeyError`: If specified groups or datasets don't exist in the file

Version

Tile

Files

dataset-reading.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

dataset-reading.mddocs/