or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

arrays-homogeneous-data.mdcompression-filtering.mdfile-operations.mdgroups-navigation.mdindex.mdquerying-indexing.mdtables-structured-data.mdtransactions-undo-redo.mdtype-system-descriptions.md

index.mddocs/

0

# PyTables

1

2

A comprehensive Python library for managing hierarchical datasets, designed to efficiently cope with extremely large amounts of data. PyTables is built on top of the HDF5 library and NumPy, featuring an object-oriented interface combined with Cython-generated C extensions for performance-critical operations. It provides fast interactive data storage and retrieval capabilities with advanced compression, indexing, and querying features optimized for scientific computing and data analysis workflows.

3

4

## Package Information

5

6

- **Package Name**: tables

7

- **Language**: Python

8

- **Installation**: `pip install tables`

9

10

## Core Imports

11

12

```python

13

import tables

14

```

15

16

Common patterns for file operations:

17

18

```python

19

import tables as tb

20

```

21

22

For specific functionality:

23

24

```python

25

from tables import open_file, File, Group, Table, Array

26

from tables import StringCol, IntCol, FloatCol # Column types

27

from tables import Filters # Compression

28

```

29

30

## Basic Usage

31

32

```python

33

import tables as tb

34

import numpy as np

35

36

# Open/create an HDF5 file

37

h5file = tb.open_file("example.h5", mode="w", title="Example File")

38

39

# Create a group for organization

40

group = h5file.create_group("/", "detector", "Detector Information")

41

42

# Create a table with structured data

43

class Particle(tb.IsDescription):

44

name = tb.StringCol(16) # 16-character String

45

idnumber = tb.Int64Col() # Signed 64-bit integer

46

ADCcount = tb.UInt16Col() # Unsigned 16-bit integer

47

TDCcount = tb.UInt8Col() # Unsigned 8-bit integer

48

energy = tb.Float32Col() # 32-bit floating point

49

timestamp = tb.Time64Col()# Timestamp

50

51

table = h5file.create_table(group, 'readout', Particle, "Readout example")

52

53

# Add data to table

54

particle = table.row

55

for i in range(10):

56

particle['name'] = f'Particle: {i:6d}'

57

particle['TDCcount'] = i % 256

58

particle['ADCcount'] = np.random.randint(0, 65535)

59

particle['energy'] = np.random.random()

60

particle['timestamp'] = i * 1.0

61

particle.append()

62

table.flush()

63

64

# Create arrays for homogeneous data

65

array_c = h5file.create_array(group, 'array_c', np.arange(100), "Array C")

66

67

# Query data

68

results = [row for row in table.where('TDCcount > 5')]

69

70

# Close file

71

h5file.close()

72

```

73

74

## Architecture

75

76

PyTables implements a hierarchical tree structure similar to a filesystem:

77

78

- **File**: Top-level container managing the entire HDF5 file and providing transaction support

79

- **Groups**: Directory-like containers that organize nodes in a hierarchical namespace

80

- **Leaves**: Data-containing nodes including Tables (structured data), Arrays (homogeneous data), and VLArrays (variable-length data)

81

- **Attributes**: Metadata attached to any node for storing small auxiliary information

82

- **Indexes**: B-tree and other indexing structures for fast data retrieval and querying

83

84

The design emphasizes memory efficiency, disk optimization, and seamless integration with NumPy arrays while providing ACID transaction capabilities through undo/redo mechanisms.

85

86

## Capabilities

87

88

### File Operations

89

90

Core file management including opening, creating, copying, and validating PyTables/HDF5 files with comprehensive mode control and optimization options.

91

92

```python { .api }

93

def open_file(filename, mode="r", title="", root_uep="/", filters=None, **kwargs): ...

94

def copy_file(srcfilename, dstfilename, overwrite=False, **kwargs): ...

95

def is_hdf5_file(filename): ...

96

def is_pytables_file(filename): ...

97

```

98

99

[File Operations](./file-operations.md)

100

101

### Hierarchical Organization

102

103

Group-based hierarchical organization for structuring datasets in tree-like namespaces with directory-style navigation and node management.

104

105

```python { .api }

106

class Group:

107

def _f_walknodes(self, classname=None): ...

108

def _f_list_nodes(self, classname=None): ...

109

def __contains__(self, name): ...

110

def __getitem__(self, name): ...

111

```

112

113

[Groups and Navigation](./groups-navigation.md)

114

115

### Structured Data Storage

116

117

Table-based structured data storage with column-oriented access, conditional querying, indexing, and modification capabilities for record-based datasets.

118

119

```python { .api }

120

class Table:

121

def read(self, start=None, stop=None, step=None, field=None, out=None): ...

122

def read_where(self, condition, condvars=None, **kwargs): ...

123

def where(self, condition, condvars=None, start=None, stop=None): ...

124

def append(self, rows): ...

125

def modify_column(self, start=None, stop=None, step=None, column=None, value=None): ...

126

```

127

128

[Tables and Structured Data](./tables-structured-data.md)

129

130

### Array Data Storage

131

132

Array-based homogeneous data storage including standard arrays, chunked arrays, enlargeable arrays, and variable-length arrays with NumPy integration.

133

134

```python { .api }

135

class Array:

136

def read(self, start=None, stop=None, step=None, out=None): ...

137

def __getitem__(self, key): ...

138

def __setitem__(self, key, value): ...

139

140

class EArray:

141

def append(self, sequence): ...

142

def read(self, start=None, stop=None, step=None, out=None): ...

143

```

144

145

[Arrays and Homogeneous Data](./arrays-homogeneous-data.md)

146

147

### Type System and Descriptions

148

149

Comprehensive type system with Atom types for individual data elements and Column types for table structure definitions, supporting all NumPy data types plus specialized types.

150

151

```python { .api }

152

class IsDescription: ...

153

154

# Atom types

155

class StringAtom: ...

156

class IntAtom: ...

157

class FloatAtom: ...

158

class TimeAtom: ...

159

160

# Column types

161

class StringCol: ...

162

class IntCol: ...

163

class FloatCol: ...

164

class TimeCol: ...

165

```

166

167

[Type System and Descriptions](./type-system-descriptions.md)

168

169

### Compression and Filtering

170

171

Advanced compression and filtering system supporting multiple algorithms (zlib, blosc, blosc2, bzip2, lzo) with configurable parameters for optimal storage and I/O performance.

172

173

```python { .api }

174

class Filters:

175

def __init__(self, complevel=0, complib="zlib", shuffle=True, bitshuffle=False, fletcher32=False): ...

176

177

def set_blosc_max_threads(nthreads): ...

178

def set_blosc2_max_threads(nthreads): ...

179

```

180

181

[Compression and Filtering](./compression-filtering.md)

182

183

### Querying and Indexing

184

185

Expression-based querying system with compiled expressions, B-tree indexing, and conditional iteration for efficient data retrieval from large datasets.

186

187

```python { .api }

188

class Expr:

189

def eval(self): ...

190

def append(self, expr): ...

191

192

# Table methods

193

def create_index(self, **kwargs): ...

194

def remove_index(self): ...

195

def reindex(self): ...

196

```

197

198

[Querying and Indexing](./querying-indexing.md)

199

200

### Transaction System

201

202

Complete undo/redo transaction system with marks, rollback capabilities, and ACID-compliant operations for data integrity and collaborative workflows.

203

204

```python { .api }

205

class File:

206

def enable_undo(self, filters=None): ...

207

def disable_undo(self): ...

208

def mark(self, name=None): ...

209

def undo(self, mark=None): ...

210

def redo(self, mark=None): ...

211

```

212

213

[Transactions and Undo/Redo](./transactions-undo-redo.md)

214

215

## Types

216

217

```python { .api }

218

class File:

219

"""Main PyTables file interface."""

220

def __init__(self, filename, mode="r", title="", root_uep="/", filters=None, **kwargs): ...

221

def close(self): ...

222

def flush(self): ...

223

def create_group(self, where, name, title="", filters=None, createparents=False): ...

224

def create_table(self, where, name, description, title="", filters=None, expectedrows=10000, createparents=False, sample=None, byteorder=None, **kwargs): ...

225

def create_array(self, where, name, object, title="", byteorder=None, createparents=False, sample=None): ...

226

227

class Node:

228

"""Base class for all PyTables nodes."""

229

def _f_close(self): ...

230

def _f_copy(self, newparent=None, newname=None, overwrite=False, recursive=False, createparents=False, **kwargs): ...

231

def _f_move(self, newparent=None, newname=None, overwrite=False, createparents=False): ...

232

def _f_remove(self): ...

233

def _f_rename(self, newname): ...

234

235

class IsDescription:

236

"""Base class for table descriptions."""

237

pass

238

239

class UnImplemented(Leaf):

240

"""

241

Represents datasets not supported by PyTables in generic HDF5 files.

242

243

Used when PyTables encounters HDF5 datasets with unsupported datatype

244

or dataspace combinations. Allows access to metadata and attributes

245

but not the actual data.

246

"""

247

248

class Unknown(Leaf):

249

"""

250

Represents unknown node types in HDF5 files.

251

252

Used as a fallback for HDF5 nodes that cannot be classified

253

into any supported PyTables category.

254

"""

255

256

FilterProperties = dict[str, any]

257

"""Dictionary containing filter and compression properties."""

258

259

__version__: str

260

"""PyTables version string."""

261

262

hdf5_version: str

263

"""Underlying HDF5 library version string."""

264

265

class Enum:

266

"""

267

Enumerated type for defining named value sets.

268

269

Used to create enumerated types where variables can take one of a

270

predefined set of named values. Each value has a name and concrete value.

271

"""

272

def __init__(self, enum_values):

273

"""

274

Create enumeration from sequence or mapping.

275

276

Parameters:

277

- enum_values: Sequence of names or mapping of names to values

278

"""

279

```

280

281

## Exceptions

282

283

```python { .api }

284

# Core Exceptions

285

class HDF5ExtError(Exception):

286

"""Errors from the HDF5 library."""

287

288

class ClosedNodeError(ValueError):

289

"""Operations on closed nodes."""

290

291

class ClosedFileError(ValueError):

292

"""Operations on closed files."""

293

294

class FileModeError(ValueError):

295

"""Invalid file mode operations."""

296

297

class NodeError(AttributeError):

298

"""General node-related errors."""

299

300

class NoSuchNodeError(LookupError):

301

"""Access to non-existent nodes."""

302

303

# Specialized Exceptions

304

class UndoRedoError(Exception):

305

"""Undo/redo system errors."""

306

307

class FlavorError(TypeError):

308

"""Data flavor conversion errors."""

309

310

class ChunkError(ValueError):

311

"""Chunking-related errors."""

312

313

class NotChunkedError(ChunkError):

314

"""Operations requiring chunked layout."""

315

316

# Warning Classes

317

class NaturalNameWarning(UserWarning):

318

"""Natural naming convention warnings."""

319

320

class PerformanceWarning(UserWarning):

321

"""Performance-related warnings."""

322

323

class DataTypeWarning(UserWarning):

324

"""Data type compatibility warnings."""

325

```

326

327

## Utility Functions

328

329

```python { .api }

330

def test():

331

"""Run the PyTables test suite."""

332

333

def print_versions():

334

"""Print version information for PyTables and dependencies."""

335

336

def silence_hdf5_messages():

337

"""Suppress HDF5 diagnostic messages."""

338

339

def restrict_flavors(keep=None):

340

"""

341

Restrict available NumPy data flavors.

342

343

Parameters:

344

- keep (list): List of flavors to keep available

345

"""

346

347

def get_pytables_version():

348

"""

349

Get PyTables version string.

350

351

Returns:

352

str: PyTables version

353

354

Note: Deprecated, use tables.__version__ instead

355

"""

356

357

def get_hdf5_version():

358

"""

359

Get HDF5 library version string.

360

361

Returns:

362

str: HDF5 version

363

364

Note: Deprecated, use tables.hdf5_version instead

365

"""

366

```

367

368

## Command-Line Tools

369

370

PyTables provides several command-line utilities for file management and inspection:

371

372

- **ptdump**: Dumps PyTables file contents in human-readable format

373

- **ptrepack**: Repacks PyTables files with optimization and format conversion

374

- **pt2to3**: Migrates PyTables files between format versions

375

- **pttree**: Displays PyTables file tree structure

376

377

These tools are available after installing PyTables and can be run directly from the command line.