tessl/pypi-dscribe

A Python package for creating feature transformations in applications of machine learning to materials science.

—

Pending

Overview

Eval results

Files

Global Descriptors

Name: tessl/pypi-dscribe
Author: tessl

Global descriptors compute features for entire atomic structures, producing a single feature vector per structure that captures overall structural properties. These descriptors are ideal for comparing and classifying different crystal structures or molecular conformations.

Capabilities

MBTR (Many-Body Tensor Representation)

MBTR represents atomic structures through many-body interaction terms, capturing both local and global structural information. It uses geometry functions to describe k-body interactions (k1: atomic properties, k2: pair interactions, k3: three-body angles) and discretizes them into histograms.

class MBTR:
    def __init__(self, geometry=None, grid=None, weighting=None, normalize_gaussians=True,
                 normalization="none", species=None, periodic=False, sparse=False, dtype="float64"):
        """
        Initialize MBTR descriptor.
        
        Parameters:
        - geometry (dict): Geometry functions configuration for k1, k2, k3 terms:
            - k1: atomic properties (e.g., "atomic_number", "coulomb_matrix")
            - k2: pair interactions (e.g., "distance", "inverse_distance")
            - k3: three-body terms (e.g., "angle", "cosine")
        - grid (dict): Discretization grids for each geometry function:
            - min/max: range bounds for the grid
            - n: number of grid points
            - sigma: Gaussian broadening width
        - weighting (dict): Weighting functions for contributions:
            - function: weighting scheme (e.g., "unity", "exp", "inverse_r0")
            - r0, c: parameters for distance-based weighting
        - normalize_gaussians (bool): Whether to normalize Gaussian broadening
        - normalization (str): Normalization scheme ("none", "l2", "n_atoms")
        - species (list): List of atomic species to include
        - periodic (bool): Whether to consider periodic boundary conditions
        - sparse (bool): Whether to return sparse arrays
        - dtype (str): Data type for arrays
        """

    def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Create MBTR descriptor for given system(s).
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray or scipy.sparse matrix: MBTR descriptors with shape (n_systems, n_features)
        """

    def derivatives(self, system, include=None, exclude=None, method="auto", 
                   return_descriptor=True, n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Calculate derivatives of MBTR descriptor with respect to atomic positions.
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - include (list): Atomic indices to include in derivative calculation
        - exclude (list): Atomic indices to exclude from derivative calculation
        - method (str): Derivative calculation method ("auto", "analytical", "numerical")
        - return_descriptor (bool): Whether to also return the descriptor values (default True)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray or tuple: Derivatives array, optionally with descriptor values
        """

    def get_number_of_features(self):
        """Get total number of features in MBTR descriptor."""

Usage Example:

from dscribe.descriptors import MBTR
from ase.build import molecule

# Setup MBTR descriptor with k2 and k3 terms
mbtr = MBTR(
    species=["H", "O"],
    geometry={
        "k2": {
            "function": "inverse_distance",
        },
        "k3": {
            "function": "angle",
        }
    },
    grid={
        "k2": {
            "min": 0.5,
            "max": 2.0,
            "n": 50,
            "sigma": 0.05
        },
        "k3": {
            "min": 0,
            "max": 180,
            "n": 50,
            "sigma": 5
        }
    },
    weighting={
        "k2": {
            "function": "exp",
            "r0": 3.5,
            "c": 0.5
        },
        "k3": {
            "function": "exp", 
            "r0": 3.5,
            "c": 0.5
        }
    }
)

# Create descriptor for water molecule
water = molecule("H2O")
mbtr_desc = mbtr.create(water)  # Shape: (1, n_features)

# Process multiple systems
molecules = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
mbtr_descriptors = mbtr.create(molecules)  # Shape: (3, n_features)

ValleOganov

ValleOganov descriptor is a shortcut implementation of the Valle-Oganov fingerprint using MBTR with specific weighting and normalization settings. It provides a standardized way to create descriptors following the Valle-Oganov methodology.

class ValleOganov:
    def __init__(self, species, function, n, sigma, r_cut, sparse=False, dtype="float64"):
        """
        Initialize Valle-Oganov descriptor.
        
        Parameters:
        - species (list): List of atomic species to include
        - function (str): Geometry function to use ("inverse_distance", "distance", etc.)
        - n (int): Number of grid points for discretization
        - sigma (float): Gaussian broadening width
        - r_cut (float): Cutoff radius for interactions
        - sparse (bool): Whether to return sparse arrays
        - dtype (str): Data type for arrays
        """

    def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Create Valle-Oganov descriptor for given system(s).
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray or scipy.sparse matrix: Valle-Oganov descriptors
        """

    def get_number_of_features(self):
        """Get total number of features in Valle-Oganov descriptor."""

Usage Example:

from dscribe.descriptors import ValleOganov
from ase.build import molecule

# Setup Valle-Oganov descriptor
vo = ValleOganov(
    species=["H", "O"],
    function="inverse_distance",
    n=100,
    sigma=0.05,
    r_cut=6.0
)

# Create descriptor for water molecule
water = molecule("H2O")
vo_desc = vo.create(water)  # Shape: (1, n_features)

MBTR Configuration Details

Geometry Functions

MBTR supports different k-body terms:

k1 terms (atomic): "atomic_number", "coulomb_matrix"
k2 terms (pairs): "distance", "inverse_distance"
k3 terms (triplets): "angle", "cosine"

Grid Configuration

Each geometry function requires a grid specification:

grid = {
    "k2": {
        "min": 0.5,    # Minimum value
        "max": 5.0,    # Maximum value  
        "n": 50,       # Number of grid points
        "sigma": 0.1   # Gaussian broadening width
    }
}

Weighting Functions

Weighting functions control how different contributions are weighted:

"unity": All contributions weighted equally
"exp": Exponential decay with distance
"inverse_r0": Inverse distance weighting

weighting = {
    "k2": {
        "function": "exp",
        "r0": 3.5,  # Reference distance
        "c": 0.5    # Decay parameter
    }
}

Common Global Descriptor Features

Global descriptors share these characteristics:

Per-structure output: Each descriptor returns one feature vector per atomic structure
Structure-level properties: Capture overall structural characteristics and symmetries
Comparison capability: Enable direct comparison between different structures
Normalization options: Support different normalization schemes for consistent scaling

Output Shapes

Global descriptors return arrays with shape:

Single system: (1, n_features)
Multiple systems: (n_systems, n_features)

This consistent output format makes global descriptors ideal for machine learning tasks that classify or compare entire structures, such as crystal structure prediction or molecular property prediction.

Install with Tessl CLI