tessl/pypi-dscribe

A Python package for creating feature transformations in applications of machine learning to materials science.

—

Pending

Overview

Eval results

Files

Matrix Descriptors

Name: tessl/pypi-dscribe
Author: tessl

Matrix descriptors represent atomic structures as matrices based on pairwise interactions between atoms, then transform these matrices into fixed-size feature vectors. These descriptors are particularly useful for molecular systems and provide intuitive representations of atomic interactions.

Capabilities

CoulombMatrix

The Coulomb Matrix represents atomic structures through Coulomb interactions between atoms. Matrix elements are the Coulomb repulsion between atoms (off-diagonal) or a polynomial of the atomic charge (diagonal).

class CoulombMatrix:
    def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
        """
        Initialize Coulomb Matrix descriptor.
        
        Parameters:
        - n_atoms_max (int): Maximum number of atoms in structures to be processed
        - permutation (str): Permutation strategy for handling atom ordering:
            - "sorted_l2": Sort rows/columns by L2 norm
            - "eigenspectrum": Use eigenvalue spectrum
            - "random": Random permutation with noise
        - sigma (float): Standard deviation for random noise (only for "random" permutation)
        - seed (int): Random seed for reproducible random permutations
        - sparse (bool): Whether to return sparse arrays
        """

    def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Create Coulomb Matrix descriptor for given system(s).
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray: Coulomb Matrix descriptors with shape (n_systems, n_features)
        """

    def get_matrix(self, system):
        """
        Get the Coulomb matrix for a single atomic system.
        
        Parameters:
        - system: ASE Atoms object or DScribe System object
        
        Returns:
        numpy.ndarray: 2D Coulomb matrix with shape (n_atoms, n_atoms)
        """

    def get_number_of_features(self):
        """Get total number of features in flattened Coulomb Matrix descriptor."""

    def unflatten(self, features, n_systems=None):
        """
        Unflatten descriptor back to 2D matrix form.
        
        Parameters:
        - features: Flattened descriptor array
        - n_systems (int): Number of systems (for multiple systems)
        
        Returns:
        numpy.ndarray: Unflattened matrix descriptors
        """

Usage Example:

from dscribe.descriptors import CoulombMatrix
from ase.build import molecule

# Setup Coulomb Matrix descriptor
cm = CoulombMatrix(
    n_atoms_max=10,
    permutation="sorted_l2"
)

# Create descriptor for water molecule  
water = molecule("H2O")
cm_desc = cm.create(water)  # Shape: (1, n_features)

# Get the actual matrix (before flattening)
cm_matrix = cm.get_matrix(water)  # Shape: (3, 3) for H2O

# Process multiple molecules
molecules = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
cm_descriptors = cm.create(molecules)  # Shape: (3, n_features)

SineMatrix

The Sine Matrix is designed for periodic systems, using sine functions instead of Coulomb interactions to handle periodic boundary conditions more effectively.

class SineMatrix:
    def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
        """
        Initialize Sine Matrix descriptor.
        
        Parameters:
        - n_atoms_max (int): Maximum number of atoms in structures to be processed
        - permutation (str): Permutation strategy for handling atom ordering:
            - "sorted_l2": Sort rows/columns by L2 norm  
            - "eigenspectrum": Use eigenvalue spectrum
            - "random": Random permutation with noise
        - sigma (float): Standard deviation for random noise (only for "random" permutation)
        - seed (int): Random seed for reproducible random permutations
        - sparse (bool): Whether to return sparse arrays
        """

    def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Create Sine Matrix descriptor for given system(s).
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray: Sine Matrix descriptors with shape (n_systems, n_features)
        """

    def get_matrix(self, system):
        """
        Get the sine matrix for a single atomic system.
        
        Parameters:
        - system: ASE Atoms object or DScribe System object
        
        Returns:
        numpy.ndarray: 2D sine matrix with shape (n_atoms, n_atoms)
        """

    def get_number_of_features(self):
        """Get total number of features in flattened Sine Matrix descriptor."""

Usage Example:

from dscribe.descriptors import SineMatrix
from ase.build import bulk

# Setup Sine Matrix descriptor for periodic systems
sm = SineMatrix(
    n_atoms_max=8,
    permutation="sorted_l2"
)

# Create descriptor for periodic system
nacl = bulk("NaCl", "rocksalt", a=5.64)
sm_desc = sm.create(nacl)  # Shape: (1, n_features)

EwaldSumMatrix

The Ewald Sum Matrix uses Ewald summation to compute electrostatic interactions in periodic systems, providing a more accurate treatment of long-range Coulomb interactions than the basic Coulomb Matrix.

class EwaldSumMatrix:
    def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
        """
        Initialize Ewald Sum Matrix descriptor.
        
        Parameters:
        - n_atoms_max (int): Maximum number of atoms in structures to be processed
        - permutation (str): Permutation strategy for handling atom ordering:
            - "sorted_l2": Sort rows/columns by L2 norm
            - "eigenspectrum": Use eigenvalue spectrum  
            - "random": Random permutation with noise
        - sigma (float): Standard deviation for random noise (only for "random" permutation)
        - seed (int): Random seed for reproducible random permutations
        - sparse (bool): Whether to return sparse arrays
        """

    def create(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None,
               n_jobs=1, only_physical_cores=False, verbose=False):
        """
        Create Ewald Sum Matrix descriptor for given system(s).
        
        Parameters:
        - system: ASE Atoms object(s) or DScribe System object(s)
        - accuracy (float): Accuracy of Ewald summation
        - w (float): Scaling parameter for electrostatic interactions
        - r_cut (float): Real-space cutoff radius (auto-determined if None)
        - g_cut (float): Reciprocal-space cutoff (auto-determined if None)
        - a (float): Ewald parameter (auto-determined if None)
        - n_jobs (int): Number of parallel processes
        - only_physical_cores (bool): Whether to use only physical CPU cores
        - verbose (bool): Whether to print progress information
        
        Returns:
        numpy.ndarray: Ewald Sum Matrix descriptors with shape (n_systems, n_features)
        """

    def get_matrix(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None):
        """
        Get the Ewald sum matrix for a single atomic system.
        
        Parameters:
        - system: ASE Atoms object or DScribe System object
        - accuracy (float): Accuracy of Ewald summation
        - w (float): Scaling parameter for electrostatic interactions
        - r_cut (float): Real-space cutoff radius
        - g_cut (float): Reciprocal-space cutoff
        - a (float): Ewald parameter
        
        Returns:
        numpy.ndarray: 2D Ewald sum matrix with shape (n_atoms, n_atoms)
        """

    def get_number_of_features(self):
        """Get total number of features in flattened Ewald Sum Matrix descriptor."""

Usage Example:

from dscribe.descriptors import EwaldSumMatrix
from ase.build import bulk

# Setup Ewald Sum Matrix descriptor
esm = EwaldSumMatrix(
    n_atoms_max=8,
    permutation="sorted_l2"
)

# Create descriptor for periodic system with custom Ewald parameters
nacl = bulk("NaCl", "rocksalt", a=5.64)
esm_desc = esm.create(nacl, accuracy=1e-6, w=0.5)  # Shape: (1, n_features)

# Get the actual Ewald matrix
esm_matrix = esm.get_matrix(nacl, accuracy=1e-6)  # Shape: (n_atoms, n_atoms)

Matrix Descriptor Base Methods

All matrix descriptors inherit from DescriptorMatrix and share these methods:

def sort(self, matrix):
    """
    Sort matrix rows and columns by L2 norm.
    
    Parameters:
    - matrix: 2D matrix to sort
    
    Returns:
    numpy.ndarray: Sorted matrix
    """

def get_eigenspectrum(self, matrix):
    """
    Get eigenvalue spectrum of matrix sorted by absolute value.
    
    Parameters:
    - matrix: 2D matrix
    
    Returns:
    numpy.ndarray: Sorted eigenvalues
    """

def zero_pad(self, array):
    """
    Zero-pad matrix to n_atoms_max size.
    
    Parameters:
    - array: Matrix to pad
    
    Returns:
    numpy.ndarray: Zero-padded matrix
    """

Permutation Strategies

Matrix descriptors handle different atom orderings through permutation strategies:

"sorted_l2": Sort matrix rows/columns by their L2 norms (most common)
"eigenspectrum": Use eigenvalue spectrum instead of full matrix
"random": Add random noise for data augmentation (requires sigma parameter)

Usage Considerations

System Size Requirements

All matrix descriptors require specifying n_atoms_max, which should be:

At least as large as the biggest system you'll process
Not too large to avoid unnecessary memory usage and computation time

Periodic vs Non-Periodic Systems

CoulombMatrix: Best for molecular systems (non-periodic)
SineMatrix: Designed for periodic systems
EwaldSumMatrix: Most accurate for periodic systems with long-range interactions