Manipulating data formats of DeePMD-kit, VASP, QE, PWmat, and LAMMPS, etc.
npx @tessl/cli install tessl/pypi-dpdata@0.2.00
# DPData
1
2
A comprehensive Python library for manipulating atomistic data formats used in computational chemistry, materials science, and machine learning. DPData provides unified interfaces for converting between different simulation and analysis software formats, supporting the complete ecosystem from quantum chemistry calculations to molecular dynamics simulations and machine learning training data.
3
4
## Package Information
5
6
- **Package Name**: dpdata
7
- **Language**: Python
8
- **Installation**: `pip install dpdata` or `conda install -c conda-forge dpdata`
9
- **License**: LGPL-3.0
10
- **Requirements**: Python 3.8+
11
12
## Core Imports
13
14
```python
15
import dpdata
16
```
17
18
Primary classes and modules:
19
20
```python
21
from dpdata import System, LabeledSystem, MultiSystems, BondOrderSystem
22
from dpdata import lammps, md, vasp
23
```
24
25
## Basic Usage
26
27
```python
28
import dpdata
29
30
# Load a VASP OUTCAR file into a labeled system
31
ls = dpdata.LabeledSystem('OUTCAR', fmt='vasp/outcar')
32
33
# Access basic properties
34
print(f"Number of frames: {ls.get_nframes()}")
35
print(f"Number of atoms: {ls.get_natoms()}")
36
print(f"Chemical formula: {ls.formula}")
37
38
# Convert to DeePMD format
39
ls.to('deepmd/npy', 'deepmd_data')
40
41
# Load unlabeled structure from POSCAR
42
sys = dpdata.System('POSCAR', fmt='vasp/poscar')
43
44
# Replicate structure
45
replicated = sys.replicate([2, 2, 2])
46
47
# Export to LAMMPS data format
48
replicated.to('lammps/lmp', 'structure.lmp')
49
```
50
51
## Architecture
52
53
DPData's architecture centers around a unified data model:
54
55
- **System Classes**: Core data containers (System, LabeledSystem, MultiSystems, BondOrderSystem)
56
- **Format System**: Plugin-based format conversion supporting 20+ software packages
57
- **Driver System**: Interface for ML model prediction and geometry optimization
58
- **Data Types**: Strongly-typed data validation and axis management
59
- **Utilities**: Analysis tools, unit conversions, and manipulation functions
60
61
This design enables seamless interoperability between computational chemistry, molecular dynamics, and machine learning workflows while maintaining data consistency and providing extensive format support.
62
63
## Capabilities
64
65
### System Management
66
67
Core classes for managing atomistic data including unlabeled structures, energy/force labeled datasets, multi-composition systems, and molecular systems with bond information.
68
69
```python { .api }
70
class System:
71
def __init__(self, file_name=None, fmt=None, type_map=None, begin=0, step=1, data=None, **kwargs): ...
72
def get_nframes(self) -> int: ...
73
def get_natoms(self) -> int: ...
74
def to(self, fmt: str, *args, **kwargs): ...
75
76
class LabeledSystem(System):
77
def has_forces(self) -> bool: ...
78
def has_virial(self) -> bool: ...
79
def correction(self, hl_sys): ...
80
81
class MultiSystems:
82
def __init__(self, *systems, type_map=None): ...
83
def append(self, *systems): ...
84
def train_test_split(self, test_size=0.2, seed=None): ...
85
86
class BondOrderSystem(System):
87
def get_nbonds(self) -> int: ...
88
def get_charge(self) -> int: ...
89
def get_mol(self): ...
90
```
91
92
[System Management](./system-management.md)
93
94
### Format Conversion
95
96
Comprehensive format support for quantum chemistry (VASP, Gaussian, CP2K), molecular dynamics (LAMMPS, GROMACS), machine learning (DeePMD-kit), and general formats (XYZ, SDF), with both Python API and command-line tools.
97
98
```python { .api }
99
def load_format(fmt: str): ...
100
101
class Format:
102
@classmethod
103
def register(cls, key: str): ...
104
@classmethod
105
def get_formats(cls) -> dict: ...
106
107
# CLI function
108
def dpdata_cli(): ...
109
def convert(from_file: str, from_format: str, to_file: str, to_format: str, **kwargs): ...
110
```
111
112
[Format Conversion](./format-conversion.md)
113
114
### Data Analysis
115
116
Statistical analysis tools, unit conversions, geometry utilities, and integration with ML prediction and optimization frameworks.
117
118
```python { .api }
119
# Statistical functions
120
def mae(errors): ...
121
def rmse(errors): ...
122
123
# Unit conversion classes
124
class EnergyConversion:
125
def __init__(self, unitA: str, unitB: str): ...
126
127
class LengthConversion:
128
def __init__(self, unitA: str, unitB: str): ...
129
130
# Utility functions
131
def elements_index_map(elements, standard=None, inverse=False): ...
132
def remove_pbc(system, protect_layer=0): ...
133
def add_atom_names(data, atom_names): ...
134
def sort_atom_names(data, type_map=None): ...
135
```
136
137
[Data Analysis](./data-analysis.md)
138
139
## Types
140
141
### Core Data Types
142
143
```python { .api }
144
class DataType:
145
def __init__(self, name: str, dtype: type, shape: tuple, required: bool = True, deepmd_name: str = None): ...
146
def check(self, system): ...
147
148
class Axis:
149
NFRAMES: str
150
NATOMS: str
151
NTYPES: str
152
NBONDS: str
153
154
class DataError(Exception):
155
"""Exception raised for invalid data"""
156
```
157
158
### Element and Unit Constants
159
160
```python { .api }
161
# Periodic table
162
ELEMENTS: list[str] # List of element symbols
163
164
# Physical constants
165
AVOGADRO: float
166
ELE_CHG: float
167
BOHR: float
168
HARTREE: float
169
RYDBERG: float
170
171
# Conversion factors
172
econvs: dict[str, float] # Energy conversion factors
173
lconvs: dict[str, float] # Length conversion factors
174
```
175
176
## Supported Formats
177
178
DPData supports extensive format coverage across the computational science ecosystem:
179
180
**Quantum Chemistry**: VASP (POSCAR/OUTCAR/xml), Gaussian (log/gjf), CP2K, ABACUS, Quantum Espresso, FHI-aims, SIESTA, ORCA, PSI4, DFTB+
181
182
**Classical MD**: LAMMPS (data/dump), AMBER, GROMACS
183
184
**ML Frameworks**: DeePMD-kit (raw/npy/hdf5), ASE
185
186
**General**: XYZ, SDF/MOL, PyMatGen structures
187
188
## Command Line Interface
189
190
DPData provides command-line format conversion:
191
192
```bash
193
# Convert VASP to DeePMD format
194
dpdata OUTCAR -i vasp/outcar -o deepmd/npy -O output_dir
195
196
# Check version
197
dpdata --version
198
```