or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

data-analysis.mdformat-conversion.mdindex.mdsystem-management.md
tile.json

tessl/pypi-dpdata

Manipulating data formats of DeePMD-kit, VASP, QE, PWmat, and LAMMPS, etc.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/dpdata@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-dpdata@0.2.0

index.mddocs/

DPData

A comprehensive Python library for manipulating atomistic data formats used in computational chemistry, materials science, and machine learning. DPData provides unified interfaces for converting between different simulation and analysis software formats, supporting the complete ecosystem from quantum chemistry calculations to molecular dynamics simulations and machine learning training data.

Package Information

  • Package Name: dpdata
  • Language: Python
  • Installation: pip install dpdata or conda install -c conda-forge dpdata
  • License: LGPL-3.0
  • Requirements: Python 3.8+

Core Imports

import dpdata

Primary classes and modules:

from dpdata import System, LabeledSystem, MultiSystems, BondOrderSystem
from dpdata import lammps, md, vasp

Basic Usage

import dpdata

# Load a VASP OUTCAR file into a labeled system
ls = dpdata.LabeledSystem('OUTCAR', fmt='vasp/outcar')

# Access basic properties
print(f"Number of frames: {ls.get_nframes()}")
print(f"Number of atoms: {ls.get_natoms()}")
print(f"Chemical formula: {ls.formula}")

# Convert to DeePMD format
ls.to('deepmd/npy', 'deepmd_data')

# Load unlabeled structure from POSCAR
sys = dpdata.System('POSCAR', fmt='vasp/poscar')

# Replicate structure
replicated = sys.replicate([2, 2, 2])

# Export to LAMMPS data format
replicated.to('lammps/lmp', 'structure.lmp')

Architecture

DPData's architecture centers around a unified data model:

  • System Classes: Core data containers (System, LabeledSystem, MultiSystems, BondOrderSystem)
  • Format System: Plugin-based format conversion supporting 20+ software packages
  • Driver System: Interface for ML model prediction and geometry optimization
  • Data Types: Strongly-typed data validation and axis management
  • Utilities: Analysis tools, unit conversions, and manipulation functions

This design enables seamless interoperability between computational chemistry, molecular dynamics, and machine learning workflows while maintaining data consistency and providing extensive format support.

Capabilities

System Management

Core classes for managing atomistic data including unlabeled structures, energy/force labeled datasets, multi-composition systems, and molecular systems with bond information.

class System:
    def __init__(self, file_name=None, fmt=None, type_map=None, begin=0, step=1, data=None, **kwargs): ...
    def get_nframes(self) -> int: ...
    def get_natoms(self) -> int: ...
    def to(self, fmt: str, *args, **kwargs): ...

class LabeledSystem(System):
    def has_forces(self) -> bool: ...
    def has_virial(self) -> bool: ...
    def correction(self, hl_sys): ...

class MultiSystems:
    def __init__(self, *systems, type_map=None): ...
    def append(self, *systems): ...
    def train_test_split(self, test_size=0.2, seed=None): ...

class BondOrderSystem(System):
    def get_nbonds(self) -> int: ...
    def get_charge(self) -> int: ...
    def get_mol(self): ...

System Management

Format Conversion

Comprehensive format support for quantum chemistry (VASP, Gaussian, CP2K), molecular dynamics (LAMMPS, GROMACS), machine learning (DeePMD-kit), and general formats (XYZ, SDF), with both Python API and command-line tools.

def load_format(fmt: str): ...

class Format:
    @classmethod
    def register(cls, key: str): ...
    @classmethod
    def get_formats(cls) -> dict: ...

# CLI function
def dpdata_cli(): ...
def convert(from_file: str, from_format: str, to_file: str, to_format: str, **kwargs): ...

Format Conversion

Data Analysis

Statistical analysis tools, unit conversions, geometry utilities, and integration with ML prediction and optimization frameworks.

# Statistical functions
def mae(errors): ...
def rmse(errors): ...

# Unit conversion classes
class EnergyConversion:
    def __init__(self, unitA: str, unitB: str): ...

class LengthConversion:
    def __init__(self, unitA: str, unitB: str): ...

# Utility functions
def elements_index_map(elements, standard=None, inverse=False): ...
def remove_pbc(system, protect_layer=0): ...
def add_atom_names(data, atom_names): ...
def sort_atom_names(data, type_map=None): ...

Data Analysis

Types

Core Data Types

class DataType:
    def __init__(self, name: str, dtype: type, shape: tuple, required: bool = True, deepmd_name: str = None): ...
    def check(self, system): ...

class Axis:
    NFRAMES: str
    NATOMS: str  
    NTYPES: str
    NBONDS: str

class DataError(Exception):
    """Exception raised for invalid data"""

Element and Unit Constants

# Periodic table
ELEMENTS: list[str]  # List of element symbols

# Physical constants
AVOGADRO: float
ELE_CHG: float
BOHR: float
HARTREE: float
RYDBERG: float

# Conversion factors
econvs: dict[str, float]  # Energy conversion factors
lconvs: dict[str, float]  # Length conversion factors

Supported Formats

DPData supports extensive format coverage across the computational science ecosystem:

Quantum Chemistry: VASP (POSCAR/OUTCAR/xml), Gaussian (log/gjf), CP2K, ABACUS, Quantum Espresso, FHI-aims, SIESTA, ORCA, PSI4, DFTB+

Classical MD: LAMMPS (data/dump), AMBER, GROMACS

ML Frameworks: DeePMD-kit (raw/npy/hdf5), ASE

General: XYZ, SDF/MOL, PyMatGen structures

Command Line Interface

DPData provides command-line format conversion:

# Convert VASP to DeePMD format
dpdata OUTCAR -i vasp/outcar -o deepmd/npy -O output_dir

# Check version
dpdata --version