or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-dpdata

Manipulating data formats of DeePMD-kit, VASP, QE, PWmat, and LAMMPS, etc.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/dpdata@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-dpdata@0.2.0

0

# DPData

1

2

A comprehensive Python library for manipulating atomistic data formats used in computational chemistry, materials science, and machine learning. DPData provides unified interfaces for converting between different simulation and analysis software formats, supporting the complete ecosystem from quantum chemistry calculations to molecular dynamics simulations and machine learning training data.

3

4

## Package Information

5

6

- **Package Name**: dpdata

7

- **Language**: Python

8

- **Installation**: `pip install dpdata` or `conda install -c conda-forge dpdata`

9

- **License**: LGPL-3.0

10

- **Requirements**: Python 3.8+

11

12

## Core Imports

13

14

```python

15

import dpdata

16

```

17

18

Primary classes and modules:

19

20

```python

21

from dpdata import System, LabeledSystem, MultiSystems, BondOrderSystem

22

from dpdata import lammps, md, vasp

23

```

24

25

## Basic Usage

26

27

```python

28

import dpdata

29

30

# Load a VASP OUTCAR file into a labeled system

31

ls = dpdata.LabeledSystem('OUTCAR', fmt='vasp/outcar')

32

33

# Access basic properties

34

print(f"Number of frames: {ls.get_nframes()}")

35

print(f"Number of atoms: {ls.get_natoms()}")

36

print(f"Chemical formula: {ls.formula}")

37

38

# Convert to DeePMD format

39

ls.to('deepmd/npy', 'deepmd_data')

40

41

# Load unlabeled structure from POSCAR

42

sys = dpdata.System('POSCAR', fmt='vasp/poscar')

43

44

# Replicate structure

45

replicated = sys.replicate([2, 2, 2])

46

47

# Export to LAMMPS data format

48

replicated.to('lammps/lmp', 'structure.lmp')

49

```

50

51

## Architecture

52

53

DPData's architecture centers around a unified data model:

54

55

- **System Classes**: Core data containers (System, LabeledSystem, MultiSystems, BondOrderSystem)

56

- **Format System**: Plugin-based format conversion supporting 20+ software packages

57

- **Driver System**: Interface for ML model prediction and geometry optimization

58

- **Data Types**: Strongly-typed data validation and axis management

59

- **Utilities**: Analysis tools, unit conversions, and manipulation functions

60

61

This design enables seamless interoperability between computational chemistry, molecular dynamics, and machine learning workflows while maintaining data consistency and providing extensive format support.

62

63

## Capabilities

64

65

### System Management

66

67

Core classes for managing atomistic data including unlabeled structures, energy/force labeled datasets, multi-composition systems, and molecular systems with bond information.

68

69

```python { .api }

70

class System:

71

def __init__(self, file_name=None, fmt=None, type_map=None, begin=0, step=1, data=None, **kwargs): ...

72

def get_nframes(self) -> int: ...

73

def get_natoms(self) -> int: ...

74

def to(self, fmt: str, *args, **kwargs): ...

75

76

class LabeledSystem(System):

77

def has_forces(self) -> bool: ...

78

def has_virial(self) -> bool: ...

79

def correction(self, hl_sys): ...

80

81

class MultiSystems:

82

def __init__(self, *systems, type_map=None): ...

83

def append(self, *systems): ...

84

def train_test_split(self, test_size=0.2, seed=None): ...

85

86

class BondOrderSystem(System):

87

def get_nbonds(self) -> int: ...

88

def get_charge(self) -> int: ...

89

def get_mol(self): ...

90

```

91

92

[System Management](./system-management.md)

93

94

### Format Conversion

95

96

Comprehensive format support for quantum chemistry (VASP, Gaussian, CP2K), molecular dynamics (LAMMPS, GROMACS), machine learning (DeePMD-kit), and general formats (XYZ, SDF), with both Python API and command-line tools.

97

98

```python { .api }

99

def load_format(fmt: str): ...

100

101

class Format:

102

@classmethod

103

def register(cls, key: str): ...

104

@classmethod

105

def get_formats(cls) -> dict: ...

106

107

# CLI function

108

def dpdata_cli(): ...

109

def convert(from_file: str, from_format: str, to_file: str, to_format: str, **kwargs): ...

110

```

111

112

[Format Conversion](./format-conversion.md)

113

114

### Data Analysis

115

116

Statistical analysis tools, unit conversions, geometry utilities, and integration with ML prediction and optimization frameworks.

117

118

```python { .api }

119

# Statistical functions

120

def mae(errors): ...

121

def rmse(errors): ...

122

123

# Unit conversion classes

124

class EnergyConversion:

125

def __init__(self, unitA: str, unitB: str): ...

126

127

class LengthConversion:

128

def __init__(self, unitA: str, unitB: str): ...

129

130

# Utility functions

131

def elements_index_map(elements, standard=None, inverse=False): ...

132

def remove_pbc(system, protect_layer=0): ...

133

def add_atom_names(data, atom_names): ...

134

def sort_atom_names(data, type_map=None): ...

135

```

136

137

[Data Analysis](./data-analysis.md)

138

139

## Types

140

141

### Core Data Types

142

143

```python { .api }

144

class DataType:

145

def __init__(self, name: str, dtype: type, shape: tuple, required: bool = True, deepmd_name: str = None): ...

146

def check(self, system): ...

147

148

class Axis:

149

NFRAMES: str

150

NATOMS: str

151

NTYPES: str

152

NBONDS: str

153

154

class DataError(Exception):

155

"""Exception raised for invalid data"""

156

```

157

158

### Element and Unit Constants

159

160

```python { .api }

161

# Periodic table

162

ELEMENTS: list[str] # List of element symbols

163

164

# Physical constants

165

AVOGADRO: float

166

ELE_CHG: float

167

BOHR: float

168

HARTREE: float

169

RYDBERG: float

170

171

# Conversion factors

172

econvs: dict[str, float] # Energy conversion factors

173

lconvs: dict[str, float] # Length conversion factors

174

```

175

176

## Supported Formats

177

178

DPData supports extensive format coverage across the computational science ecosystem:

179

180

**Quantum Chemistry**: VASP (POSCAR/OUTCAR/xml), Gaussian (log/gjf), CP2K, ABACUS, Quantum Espresso, FHI-aims, SIESTA, ORCA, PSI4, DFTB+

181

182

**Classical MD**: LAMMPS (data/dump), AMBER, GROMACS

183

184

**ML Frameworks**: DeePMD-kit (raw/npy/hdf5), ASE

185

186

**General**: XYZ, SDF/MOL, PyMatGen structures

187

188

## Command Line Interface

189

190

DPData provides command-line format conversion:

191

192

```bash

193

# Convert VASP to DeePMD format

194

dpdata OUTCAR -i vasp/outcar -o deepmd/npy -O output_dir

195

196

# Check version

197

dpdata --version

198

```