Manipulating data formats of DeePMD-kit, VASP, QE, PWmat, and LAMMPS, etc.
npx @tessl/cli install tessl/pypi-dpdata@0.2.0A comprehensive Python library for manipulating atomistic data formats used in computational chemistry, materials science, and machine learning. DPData provides unified interfaces for converting between different simulation and analysis software formats, supporting the complete ecosystem from quantum chemistry calculations to molecular dynamics simulations and machine learning training data.
pip install dpdata or conda install -c conda-forge dpdataimport dpdataPrimary classes and modules:
from dpdata import System, LabeledSystem, MultiSystems, BondOrderSystem
from dpdata import lammps, md, vaspimport dpdata
# Load a VASP OUTCAR file into a labeled system
ls = dpdata.LabeledSystem('OUTCAR', fmt='vasp/outcar')
# Access basic properties
print(f"Number of frames: {ls.get_nframes()}")
print(f"Number of atoms: {ls.get_natoms()}")
print(f"Chemical formula: {ls.formula}")
# Convert to DeePMD format
ls.to('deepmd/npy', 'deepmd_data')
# Load unlabeled structure from POSCAR
sys = dpdata.System('POSCAR', fmt='vasp/poscar')
# Replicate structure
replicated = sys.replicate([2, 2, 2])
# Export to LAMMPS data format
replicated.to('lammps/lmp', 'structure.lmp')DPData's architecture centers around a unified data model:
This design enables seamless interoperability between computational chemistry, molecular dynamics, and machine learning workflows while maintaining data consistency and providing extensive format support.
Core classes for managing atomistic data including unlabeled structures, energy/force labeled datasets, multi-composition systems, and molecular systems with bond information.
class System:
def __init__(self, file_name=None, fmt=None, type_map=None, begin=0, step=1, data=None, **kwargs): ...
def get_nframes(self) -> int: ...
def get_natoms(self) -> int: ...
def to(self, fmt: str, *args, **kwargs): ...
class LabeledSystem(System):
def has_forces(self) -> bool: ...
def has_virial(self) -> bool: ...
def correction(self, hl_sys): ...
class MultiSystems:
def __init__(self, *systems, type_map=None): ...
def append(self, *systems): ...
def train_test_split(self, test_size=0.2, seed=None): ...
class BondOrderSystem(System):
def get_nbonds(self) -> int: ...
def get_charge(self) -> int: ...
def get_mol(self): ...Comprehensive format support for quantum chemistry (VASP, Gaussian, CP2K), molecular dynamics (LAMMPS, GROMACS), machine learning (DeePMD-kit), and general formats (XYZ, SDF), with both Python API and command-line tools.
def load_format(fmt: str): ...
class Format:
@classmethod
def register(cls, key: str): ...
@classmethod
def get_formats(cls) -> dict: ...
# CLI function
def dpdata_cli(): ...
def convert(from_file: str, from_format: str, to_file: str, to_format: str, **kwargs): ...Statistical analysis tools, unit conversions, geometry utilities, and integration with ML prediction and optimization frameworks.
# Statistical functions
def mae(errors): ...
def rmse(errors): ...
# Unit conversion classes
class EnergyConversion:
def __init__(self, unitA: str, unitB: str): ...
class LengthConversion:
def __init__(self, unitA: str, unitB: str): ...
# Utility functions
def elements_index_map(elements, standard=None, inverse=False): ...
def remove_pbc(system, protect_layer=0): ...
def add_atom_names(data, atom_names): ...
def sort_atom_names(data, type_map=None): ...class DataType:
def __init__(self, name: str, dtype: type, shape: tuple, required: bool = True, deepmd_name: str = None): ...
def check(self, system): ...
class Axis:
NFRAMES: str
NATOMS: str
NTYPES: str
NBONDS: str
class DataError(Exception):
"""Exception raised for invalid data"""# Periodic table
ELEMENTS: list[str] # List of element symbols
# Physical constants
AVOGADRO: float
ELE_CHG: float
BOHR: float
HARTREE: float
RYDBERG: float
# Conversion factors
econvs: dict[str, float] # Energy conversion factors
lconvs: dict[str, float] # Length conversion factorsDPData supports extensive format coverage across the computational science ecosystem:
Quantum Chemistry: VASP (POSCAR/OUTCAR/xml), Gaussian (log/gjf), CP2K, ABACUS, Quantum Espresso, FHI-aims, SIESTA, ORCA, PSI4, DFTB+
Classical MD: LAMMPS (data/dump), AMBER, GROMACS
ML Frameworks: DeePMD-kit (raw/npy/hdf5), ASE
General: XYZ, SDF/MOL, PyMatGen structures
DPData provides command-line format conversion:
# Convert VASP to DeePMD format
dpdata OUTCAR -i vasp/outcar -o deepmd/npy -O output_dir
# Check version
dpdata --version