Hierarchical datasets for Python with HDF5 library for managing extremely large amounts of data
npx @tessl/cli install tessl/pypi-tables@3.10.00
# PyTables
1
2
A comprehensive Python library for managing hierarchical datasets, designed to efficiently cope with extremely large amounts of data. PyTables is built on top of the HDF5 library and NumPy, featuring an object-oriented interface combined with Cython-generated C extensions for performance-critical operations. It provides fast interactive data storage and retrieval capabilities with advanced compression, indexing, and querying features optimized for scientific computing and data analysis workflows.
3
4
## Package Information
5
6
- **Package Name**: tables
7
- **Language**: Python
8
- **Installation**: `pip install tables`
9
10
## Core Imports
11
12
```python
13
import tables
14
```
15
16
Common patterns for file operations:
17
18
```python
19
import tables as tb
20
```
21
22
For specific functionality:
23
24
```python
25
from tables import open_file, File, Group, Table, Array
26
from tables import StringCol, IntCol, FloatCol # Column types
27
from tables import Filters # Compression
28
```
29
30
## Basic Usage
31
32
```python
33
import tables as tb
34
import numpy as np
35
36
# Open/create an HDF5 file
37
h5file = tb.open_file("example.h5", mode="w", title="Example File")
38
39
# Create a group for organization
40
group = h5file.create_group("/", "detector", "Detector Information")
41
42
# Create a table with structured data
43
class Particle(tb.IsDescription):
44
name = tb.StringCol(16) # 16-character String
45
idnumber = tb.Int64Col() # Signed 64-bit integer
46
ADCcount = tb.UInt16Col() # Unsigned 16-bit integer
47
TDCcount = tb.UInt8Col() # Unsigned 8-bit integer
48
energy = tb.Float32Col() # 32-bit floating point
49
timestamp = tb.Time64Col()# Timestamp
50
51
table = h5file.create_table(group, 'readout', Particle, "Readout example")
52
53
# Add data to table
54
particle = table.row
55
for i in range(10):
56
particle['name'] = f'Particle: {i:6d}'
57
particle['TDCcount'] = i % 256
58
particle['ADCcount'] = np.random.randint(0, 65535)
59
particle['energy'] = np.random.random()
60
particle['timestamp'] = i * 1.0
61
particle.append()
62
table.flush()
63
64
# Create arrays for homogeneous data
65
array_c = h5file.create_array(group, 'array_c', np.arange(100), "Array C")
66
67
# Query data
68
results = [row for row in table.where('TDCcount > 5')]
69
70
# Close file
71
h5file.close()
72
```
73
74
## Architecture
75
76
PyTables implements a hierarchical tree structure similar to a filesystem:
77
78
- **File**: Top-level container managing the entire HDF5 file and providing transaction support
79
- **Groups**: Directory-like containers that organize nodes in a hierarchical namespace
80
- **Leaves**: Data-containing nodes including Tables (structured data), Arrays (homogeneous data), and VLArrays (variable-length data)
81
- **Attributes**: Metadata attached to any node for storing small auxiliary information
82
- **Indexes**: B-tree and other indexing structures for fast data retrieval and querying
83
84
The design emphasizes memory efficiency, disk optimization, and seamless integration with NumPy arrays while providing ACID transaction capabilities through undo/redo mechanisms.
85
86
## Capabilities
87
88
### File Operations
89
90
Core file management including opening, creating, copying, and validating PyTables/HDF5 files with comprehensive mode control and optimization options.
91
92
```python { .api }
93
def open_file(filename, mode="r", title="", root_uep="/", filters=None, **kwargs): ...
94
def copy_file(srcfilename, dstfilename, overwrite=False, **kwargs): ...
95
def is_hdf5_file(filename): ...
96
def is_pytables_file(filename): ...
97
```
98
99
[File Operations](./file-operations.md)
100
101
### Hierarchical Organization
102
103
Group-based hierarchical organization for structuring datasets in tree-like namespaces with directory-style navigation and node management.
104
105
```python { .api }
106
class Group:
107
def _f_walknodes(self, classname=None): ...
108
def _f_list_nodes(self, classname=None): ...
109
def __contains__(self, name): ...
110
def __getitem__(self, name): ...
111
```
112
113
[Groups and Navigation](./groups-navigation.md)
114
115
### Structured Data Storage
116
117
Table-based structured data storage with column-oriented access, conditional querying, indexing, and modification capabilities for record-based datasets.
118
119
```python { .api }
120
class Table:
121
def read(self, start=None, stop=None, step=None, field=None, out=None): ...
122
def read_where(self, condition, condvars=None, **kwargs): ...
123
def where(self, condition, condvars=None, start=None, stop=None): ...
124
def append(self, rows): ...
125
def modify_column(self, start=None, stop=None, step=None, column=None, value=None): ...
126
```
127
128
[Tables and Structured Data](./tables-structured-data.md)
129
130
### Array Data Storage
131
132
Array-based homogeneous data storage including standard arrays, chunked arrays, enlargeable arrays, and variable-length arrays with NumPy integration.
133
134
```python { .api }
135
class Array:
136
def read(self, start=None, stop=None, step=None, out=None): ...
137
def __getitem__(self, key): ...
138
def __setitem__(self, key, value): ...
139
140
class EArray:
141
def append(self, sequence): ...
142
def read(self, start=None, stop=None, step=None, out=None): ...
143
```
144
145
[Arrays and Homogeneous Data](./arrays-homogeneous-data.md)
146
147
### Type System and Descriptions
148
149
Comprehensive type system with Atom types for individual data elements and Column types for table structure definitions, supporting all NumPy data types plus specialized types.
150
151
```python { .api }
152
class IsDescription: ...
153
154
# Atom types
155
class StringAtom: ...
156
class IntAtom: ...
157
class FloatAtom: ...
158
class TimeAtom: ...
159
160
# Column types
161
class StringCol: ...
162
class IntCol: ...
163
class FloatCol: ...
164
class TimeCol: ...
165
```
166
167
[Type System and Descriptions](./type-system-descriptions.md)
168
169
### Compression and Filtering
170
171
Advanced compression and filtering system supporting multiple algorithms (zlib, blosc, blosc2, bzip2, lzo) with configurable parameters for optimal storage and I/O performance.
172
173
```python { .api }
174
class Filters:
175
def __init__(self, complevel=0, complib="zlib", shuffle=True, bitshuffle=False, fletcher32=False): ...
176
177
def set_blosc_max_threads(nthreads): ...
178
def set_blosc2_max_threads(nthreads): ...
179
```
180
181
[Compression and Filtering](./compression-filtering.md)
182
183
### Querying and Indexing
184
185
Expression-based querying system with compiled expressions, B-tree indexing, and conditional iteration for efficient data retrieval from large datasets.
186
187
```python { .api }
188
class Expr:
189
def eval(self): ...
190
def append(self, expr): ...
191
192
# Table methods
193
def create_index(self, **kwargs): ...
194
def remove_index(self): ...
195
def reindex(self): ...
196
```
197
198
[Querying and Indexing](./querying-indexing.md)
199
200
### Transaction System
201
202
Complete undo/redo transaction system with marks, rollback capabilities, and ACID-compliant operations for data integrity and collaborative workflows.
203
204
```python { .api }
205
class File:
206
def enable_undo(self, filters=None): ...
207
def disable_undo(self): ...
208
def mark(self, name=None): ...
209
def undo(self, mark=None): ...
210
def redo(self, mark=None): ...
211
```
212
213
[Transactions and Undo/Redo](./transactions-undo-redo.md)
214
215
## Types
216
217
```python { .api }
218
class File:
219
"""Main PyTables file interface."""
220
def __init__(self, filename, mode="r", title="", root_uep="/", filters=None, **kwargs): ...
221
def close(self): ...
222
def flush(self): ...
223
def create_group(self, where, name, title="", filters=None, createparents=False): ...
224
def create_table(self, where, name, description, title="", filters=None, expectedrows=10000, createparents=False, sample=None, byteorder=None, **kwargs): ...
225
def create_array(self, where, name, object, title="", byteorder=None, createparents=False, sample=None): ...
226
227
class Node:
228
"""Base class for all PyTables nodes."""
229
def _f_close(self): ...
230
def _f_copy(self, newparent=None, newname=None, overwrite=False, recursive=False, createparents=False, **kwargs): ...
231
def _f_move(self, newparent=None, newname=None, overwrite=False, createparents=False): ...
232
def _f_remove(self): ...
233
def _f_rename(self, newname): ...
234
235
class IsDescription:
236
"""Base class for table descriptions."""
237
pass
238
239
class UnImplemented(Leaf):
240
"""
241
Represents datasets not supported by PyTables in generic HDF5 files.
242
243
Used when PyTables encounters HDF5 datasets with unsupported datatype
244
or dataspace combinations. Allows access to metadata and attributes
245
but not the actual data.
246
"""
247
248
class Unknown(Leaf):
249
"""
250
Represents unknown node types in HDF5 files.
251
252
Used as a fallback for HDF5 nodes that cannot be classified
253
into any supported PyTables category.
254
"""
255
256
FilterProperties = dict[str, any]
257
"""Dictionary containing filter and compression properties."""
258
259
__version__: str
260
"""PyTables version string."""
261
262
hdf5_version: str
263
"""Underlying HDF5 library version string."""
264
265
class Enum:
266
"""
267
Enumerated type for defining named value sets.
268
269
Used to create enumerated types where variables can take one of a
270
predefined set of named values. Each value has a name and concrete value.
271
"""
272
def __init__(self, enum_values):
273
"""
274
Create enumeration from sequence or mapping.
275
276
Parameters:
277
- enum_values: Sequence of names or mapping of names to values
278
"""
279
```
280
281
## Exceptions
282
283
```python { .api }
284
# Core Exceptions
285
class HDF5ExtError(Exception):
286
"""Errors from the HDF5 library."""
287
288
class ClosedNodeError(ValueError):
289
"""Operations on closed nodes."""
290
291
class ClosedFileError(ValueError):
292
"""Operations on closed files."""
293
294
class FileModeError(ValueError):
295
"""Invalid file mode operations."""
296
297
class NodeError(AttributeError):
298
"""General node-related errors."""
299
300
class NoSuchNodeError(LookupError):
301
"""Access to non-existent nodes."""
302
303
# Specialized Exceptions
304
class UndoRedoError(Exception):
305
"""Undo/redo system errors."""
306
307
class FlavorError(TypeError):
308
"""Data flavor conversion errors."""
309
310
class ChunkError(ValueError):
311
"""Chunking-related errors."""
312
313
class NotChunkedError(ChunkError):
314
"""Operations requiring chunked layout."""
315
316
# Warning Classes
317
class NaturalNameWarning(UserWarning):
318
"""Natural naming convention warnings."""
319
320
class PerformanceWarning(UserWarning):
321
"""Performance-related warnings."""
322
323
class DataTypeWarning(UserWarning):
324
"""Data type compatibility warnings."""
325
```
326
327
## Utility Functions
328
329
```python { .api }
330
def test():
331
"""Run the PyTables test suite."""
332
333
def print_versions():
334
"""Print version information for PyTables and dependencies."""
335
336
def silence_hdf5_messages():
337
"""Suppress HDF5 diagnostic messages."""
338
339
def restrict_flavors(keep=None):
340
"""
341
Restrict available NumPy data flavors.
342
343
Parameters:
344
- keep (list): List of flavors to keep available
345
"""
346
347
def get_pytables_version():
348
"""
349
Get PyTables version string.
350
351
Returns:
352
str: PyTables version
353
354
Note: Deprecated, use tables.__version__ instead
355
"""
356
357
def get_hdf5_version():
358
"""
359
Get HDF5 library version string.
360
361
Returns:
362
str: HDF5 version
363
364
Note: Deprecated, use tables.hdf5_version instead
365
"""
366
```
367
368
## Command-Line Tools
369
370
PyTables provides several command-line utilities for file management and inspection:
371
372
- **ptdump**: Dumps PyTables file contents in human-readable format
373
- **ptrepack**: Repacks PyTables files with optimization and format conversion
374
- **pt2to3**: Migrates PyTables files between format versions
375
- **pttree**: Displays PyTables file tree structure
376
377
These tools are available after installing PyTables and can be run directly from the command line.