Tessl Tile for pypi/vaex-hdf5@0.14.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

data-export.md dataset-reading.md high-performance-writing.md index.md memory-mapping-utils.md

index.mddocs/

0
# vaex-hdf5
1

2
HDF5 file support for the Vaex high-performance Python library that enables lazy out-of-core DataFrame operations on large datasets. It offers memory-mapped HDF5 file reading capabilities with zero-copy access patterns, supports various HDF5 dataset formats including scientific data from Gadget simulations and AMUSE astrophysics framework, and provides efficient data export functionality to HDF5 format.
3

4
## Package Information
5

6
- **Package Name**: vaex-hdf5
7
- **Language**: Python
8
- **Installation**: `pip install vaex-hdf5`
9
- **Dependencies**: h5py>=2.9, vaex-core>=4.0.0,<5
10

11
## Core Imports
12

13
```python
14
import vaex.hdf5.dataset
15
import vaex.hdf5.export
16
import vaex.hdf5.writer
17
import vaex.hdf5.utils
18
```
19

20
For direct dataset access:
21

22
```python
23
from vaex.hdf5.dataset import Hdf5MemoryMapped, AmuseHdf5MemoryMapped, Hdf5MemoryMappedGadget
24
```
25

26
## Basic Usage
27

28
```python
29
import vaex
30

31
# Reading HDF5 files (automatic detection via vaex.open)
32
df = vaex.open('data.hdf5')
33

34
# Reading specialized formats
35
df_amuse = vaex.open('simulation.hdf5')  # AMUSE format auto-detected
36
df_gadget = vaex.open('snapshot.hdf5#0')  # Gadget format with particle type
37

38
# Exporting to HDF5
39
df = vaex.from_csv('data.csv')
40
df.export('output.hdf5')
41

42
# Manual dataset creation
43
from vaex.hdf5.dataset import Hdf5MemoryMapped
44
dataset = Hdf5MemoryMapped.create('new_file.hdf5', N=1000, 
45
                                  column_names=['x', 'y', 'z'])
46

47
# High-performance writing with Writer
48
from vaex.hdf5.writer import Writer
49
with Writer('output.hdf5') as writer:
50
    writer.layout(df)
51
    writer.write(df)
52
```
53

54
## Architecture
55

56
The vaex-hdf5 package is built around several key components:
57

58
- **Dataset Readers**: Memory-mapped HDF5 dataset classes that provide zero-copy access to data
59
- **Export Functions**: High-level functions for exporting vaex DataFrames to HDF5 format
60
- **Writer Classes**: Low-level writers for efficient streaming data export
61
- **Entry Points**: Automatic format detection and registration with vaex core
62

63
The package integrates seamlessly with the broader Vaex ecosystem through entry points that register HDF5 dataset openers, enabling automatic format detection and optimal performance for billion-row datasets through lazy evaluation and memory mapping techniques.
64

65
## Capabilities
66

67
### HDF5 Dataset Reading
68

69
Memory-mapped reading of HDF5 files with support for standard vaex format, AMUSE scientific data format, and Gadget2 simulation format. Provides zero-copy access patterns and automatic format detection.
70

71
```python { .api }
72
class Hdf5MemoryMapped:
73
    def __init__(self, path, write=False, fs_options={}, fs=None, nommap=None, group=None, _fingerprint=None): ...
74
    @classmethod
75
    def create(cls, path, N, column_names, dtypes=None, write=True): ...
76
    @classmethod
77
    def can_open(cls, path, fs_options={}, fs=None, group=None, **kwargs): ...
78
    def write_meta(self): ...
79
    def close(self): ...
80

81
class AmuseHdf5MemoryMapped(Hdf5MemoryMapped):
82
    def __init__(self, path, write=False, fs_options={}, fs=None): ...
83

84
class Hdf5MemoryMappedGadget(DatasetMemoryMapped):
85
    def __init__(self, path, particle_name=None, particle_type=None, fs_options={}, fs=None): ...
86
```
87

88
[HDF5 Dataset Reading](./dataset-reading.md)
89

90
### Data Export
91

92
High-level functions for exporting vaex DataFrames to HDF5 format with support for both version 1 and version 2 formats, compression options, and streaming export for large datasets.
93

94
```python { .api }
95
def export_hdf5(dataset, path, column_names=None, byteorder="=", shuffle=False, 
96
                selection=False, progress=None, virtual=True, sort=None, 
97
                ascending=True, parallel=True): ...
98

99
def export_hdf5_v1(dataset, path, column_names=None, byteorder="=", shuffle=False, 
100
                   selection=False, progress=None, virtual=True): ...
101
```
102

103
[Data Export](./data-export.md)
104

105
### High-Performance Writing
106

107
Low-level writer classes for streaming large datasets to HDF5 format with optimal memory usage, parallel writing support, and specialized column writers for different data types.
108

109
```python { .api }
110
class Writer:
111
    def __init__(self, path, group="/table", mode="w", byteorder="="): ...
112
    def layout(self, df, progress=None): ...
113
    def write(self, df, chunk_size=int(1e5), parallel=True, progress=None, 
114
              column_count=1, export_threads=0): ...
115
    def close(self): ...
116
    def __enter__(self): ...
117
    def __exit__(self, *args): ...
118
```
119

120
[High-Performance Writing](./high-performance-writing.md)
121

122
### Memory Mapping Utilities
123

124
Low-level utilities for memory mapping HDF5 datasets and arrays with support for masked arrays and different storage layouts.
125

126
```python { .api }
127
def mmap_array(mmap, file, offset, dtype, shape): ...
128
def h5mmap(mmap, file, data, mask=None): ...
129
```
130

131
[Memory Mapping Utilities](./memory-mapping-utils.md)
132

133
## Types
134

135
```python { .api }
136
# Common type aliases used throughout the API
137
PathLike = Union[str, Path]
138
FileSystemOptions = Dict[str, Any]
139
FileSystem = Any  # fsspec filesystem
140
ProgressCallback = Callable[[float], bool]
141
ByteOrder = Literal["=", "<", ">"]
142
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/