A Python package for creating feature transformations in applications of machine learning to materials science.
—
Matrix descriptors represent atomic structures as matrices based on pairwise interactions between atoms, then transform these matrices into fixed-size feature vectors. These descriptors are particularly useful for molecular systems and provide intuitive representations of atomic interactions.
The Coulomb Matrix represents atomic structures through Coulomb interactions between atoms. Matrix elements are the Coulomb repulsion between atoms (off-diagonal) or a polynomial of the atomic charge (diagonal).
class CoulombMatrix:
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
"""
Initialize Coulomb Matrix descriptor.
Parameters:
- n_atoms_max (int): Maximum number of atoms in structures to be processed
- permutation (str): Permutation strategy for handling atom ordering:
- "sorted_l2": Sort rows/columns by L2 norm
- "eigenspectrum": Use eigenvalue spectrum
- "random": Random permutation with noise
- sigma (float): Standard deviation for random noise (only for "random" permutation)
- seed (int): Random seed for reproducible random permutations
- sparse (bool): Whether to return sparse arrays
"""
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
"""
Create Coulomb Matrix descriptor for given system(s).
Parameters:
- system: ASE Atoms object(s) or DScribe System object(s)
- n_jobs (int): Number of parallel processes
- only_physical_cores (bool): Whether to use only physical CPU cores
- verbose (bool): Whether to print progress information
Returns:
numpy.ndarray: Coulomb Matrix descriptors with shape (n_systems, n_features)
"""
def get_matrix(self, system):
"""
Get the Coulomb matrix for a single atomic system.
Parameters:
- system: ASE Atoms object or DScribe System object
Returns:
numpy.ndarray: 2D Coulomb matrix with shape (n_atoms, n_atoms)
"""
def get_number_of_features(self):
"""Get total number of features in flattened Coulomb Matrix descriptor."""
def unflatten(self, features, n_systems=None):
"""
Unflatten descriptor back to 2D matrix form.
Parameters:
- features: Flattened descriptor array
- n_systems (int): Number of systems (for multiple systems)
Returns:
numpy.ndarray: Unflattened matrix descriptors
"""Usage Example:
from dscribe.descriptors import CoulombMatrix
from ase.build import molecule
# Setup Coulomb Matrix descriptor
cm = CoulombMatrix(
n_atoms_max=10,
permutation="sorted_l2"
)
# Create descriptor for water molecule
water = molecule("H2O")
cm_desc = cm.create(water) # Shape: (1, n_features)
# Get the actual matrix (before flattening)
cm_matrix = cm.get_matrix(water) # Shape: (3, 3) for H2O
# Process multiple molecules
molecules = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
cm_descriptors = cm.create(molecules) # Shape: (3, n_features)The Sine Matrix is designed for periodic systems, using sine functions instead of Coulomb interactions to handle periodic boundary conditions more effectively.
class SineMatrix:
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
"""
Initialize Sine Matrix descriptor.
Parameters:
- n_atoms_max (int): Maximum number of atoms in structures to be processed
- permutation (str): Permutation strategy for handling atom ordering:
- "sorted_l2": Sort rows/columns by L2 norm
- "eigenspectrum": Use eigenvalue spectrum
- "random": Random permutation with noise
- sigma (float): Standard deviation for random noise (only for "random" permutation)
- seed (int): Random seed for reproducible random permutations
- sparse (bool): Whether to return sparse arrays
"""
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
"""
Create Sine Matrix descriptor for given system(s).
Parameters:
- system: ASE Atoms object(s) or DScribe System object(s)
- n_jobs (int): Number of parallel processes
- only_physical_cores (bool): Whether to use only physical CPU cores
- verbose (bool): Whether to print progress information
Returns:
numpy.ndarray: Sine Matrix descriptors with shape (n_systems, n_features)
"""
def get_matrix(self, system):
"""
Get the sine matrix for a single atomic system.
Parameters:
- system: ASE Atoms object or DScribe System object
Returns:
numpy.ndarray: 2D sine matrix with shape (n_atoms, n_atoms)
"""
def get_number_of_features(self):
"""Get total number of features in flattened Sine Matrix descriptor."""Usage Example:
from dscribe.descriptors import SineMatrix
from ase.build import bulk
# Setup Sine Matrix descriptor for periodic systems
sm = SineMatrix(
n_atoms_max=8,
permutation="sorted_l2"
)
# Create descriptor for periodic system
nacl = bulk("NaCl", "rocksalt", a=5.64)
sm_desc = sm.create(nacl) # Shape: (1, n_features)The Ewald Sum Matrix uses Ewald summation to compute electrostatic interactions in periodic systems, providing a more accurate treatment of long-range Coulomb interactions than the basic Coulomb Matrix.
class EwaldSumMatrix:
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
"""
Initialize Ewald Sum Matrix descriptor.
Parameters:
- n_atoms_max (int): Maximum number of atoms in structures to be processed
- permutation (str): Permutation strategy for handling atom ordering:
- "sorted_l2": Sort rows/columns by L2 norm
- "eigenspectrum": Use eigenvalue spectrum
- "random": Random permutation with noise
- sigma (float): Standard deviation for random noise (only for "random" permutation)
- seed (int): Random seed for reproducible random permutations
- sparse (bool): Whether to return sparse arrays
"""
def create(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None,
n_jobs=1, only_physical_cores=False, verbose=False):
"""
Create Ewald Sum Matrix descriptor for given system(s).
Parameters:
- system: ASE Atoms object(s) or DScribe System object(s)
- accuracy (float): Accuracy of Ewald summation
- w (float): Scaling parameter for electrostatic interactions
- r_cut (float): Real-space cutoff radius (auto-determined if None)
- g_cut (float): Reciprocal-space cutoff (auto-determined if None)
- a (float): Ewald parameter (auto-determined if None)
- n_jobs (int): Number of parallel processes
- only_physical_cores (bool): Whether to use only physical CPU cores
- verbose (bool): Whether to print progress information
Returns:
numpy.ndarray: Ewald Sum Matrix descriptors with shape (n_systems, n_features)
"""
def get_matrix(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None):
"""
Get the Ewald sum matrix for a single atomic system.
Parameters:
- system: ASE Atoms object or DScribe System object
- accuracy (float): Accuracy of Ewald summation
- w (float): Scaling parameter for electrostatic interactions
- r_cut (float): Real-space cutoff radius
- g_cut (float): Reciprocal-space cutoff
- a (float): Ewald parameter
Returns:
numpy.ndarray: 2D Ewald sum matrix with shape (n_atoms, n_atoms)
"""
def get_number_of_features(self):
"""Get total number of features in flattened Ewald Sum Matrix descriptor."""Usage Example:
from dscribe.descriptors import EwaldSumMatrix
from ase.build import bulk
# Setup Ewald Sum Matrix descriptor
esm = EwaldSumMatrix(
n_atoms_max=8,
permutation="sorted_l2"
)
# Create descriptor for periodic system with custom Ewald parameters
nacl = bulk("NaCl", "rocksalt", a=5.64)
esm_desc = esm.create(nacl, accuracy=1e-6, w=0.5) # Shape: (1, n_features)
# Get the actual Ewald matrix
esm_matrix = esm.get_matrix(nacl, accuracy=1e-6) # Shape: (n_atoms, n_atoms)All matrix descriptors inherit from DescriptorMatrix and share these methods:
def sort(self, matrix):
"""
Sort matrix rows and columns by L2 norm.
Parameters:
- matrix: 2D matrix to sort
Returns:
numpy.ndarray: Sorted matrix
"""
def get_eigenspectrum(self, matrix):
"""
Get eigenvalue spectrum of matrix sorted by absolute value.
Parameters:
- matrix: 2D matrix
Returns:
numpy.ndarray: Sorted eigenvalues
"""
def zero_pad(self, array):
"""
Zero-pad matrix to n_atoms_max size.
Parameters:
- array: Matrix to pad
Returns:
numpy.ndarray: Zero-padded matrix
"""Matrix descriptors handle different atom orderings through permutation strategies:
sigma parameter)All matrix descriptors require specifying n_atoms_max, which should be:
Matrix descriptors flatten the 2D matrices into 1D feature vectors:
(n_atoms, n_atoms)(n_atoms_max * n_atoms_max,)(n_systems, n_atoms_max * n_atoms_max)Use the unflatten() method to recover the original 2D matrix format when needed.
Install with Tessl CLI
npx tessl i tessl/pypi-dscribe