0
# Matrix Descriptors
1
2
Matrix descriptors represent atomic structures as matrices based on pairwise interactions between atoms, then transform these matrices into fixed-size feature vectors. These descriptors are particularly useful for molecular systems and provide intuitive representations of atomic interactions.
3
4
## Capabilities
5
6
### CoulombMatrix
7
8
The Coulomb Matrix represents atomic structures through Coulomb interactions between atoms. Matrix elements are the Coulomb repulsion between atoms (off-diagonal) or a polynomial of the atomic charge (diagonal).
9
10
```python { .api }
11
class CoulombMatrix:
12
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
13
"""
14
Initialize Coulomb Matrix descriptor.
15
16
Parameters:
17
- n_atoms_max (int): Maximum number of atoms in structures to be processed
18
- permutation (str): Permutation strategy for handling atom ordering:
19
- "sorted_l2": Sort rows/columns by L2 norm
20
- "eigenspectrum": Use eigenvalue spectrum
21
- "random": Random permutation with noise
22
- sigma (float): Standard deviation for random noise (only for "random" permutation)
23
- seed (int): Random seed for reproducible random permutations
24
- sparse (bool): Whether to return sparse arrays
25
"""
26
27
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
28
"""
29
Create Coulomb Matrix descriptor for given system(s).
30
31
Parameters:
32
- system: ASE Atoms object(s) or DScribe System object(s)
33
- n_jobs (int): Number of parallel processes
34
- only_physical_cores (bool): Whether to use only physical CPU cores
35
- verbose (bool): Whether to print progress information
36
37
Returns:
38
numpy.ndarray: Coulomb Matrix descriptors with shape (n_systems, n_features)
39
"""
40
41
def get_matrix(self, system):
42
"""
43
Get the Coulomb matrix for a single atomic system.
44
45
Parameters:
46
- system: ASE Atoms object or DScribe System object
47
48
Returns:
49
numpy.ndarray: 2D Coulomb matrix with shape (n_atoms, n_atoms)
50
"""
51
52
def get_number_of_features(self):
53
"""Get total number of features in flattened Coulomb Matrix descriptor."""
54
55
def unflatten(self, features, n_systems=None):
56
"""
57
Unflatten descriptor back to 2D matrix form.
58
59
Parameters:
60
- features: Flattened descriptor array
61
- n_systems (int): Number of systems (for multiple systems)
62
63
Returns:
64
numpy.ndarray: Unflattened matrix descriptors
65
"""
66
```
67
68
**Usage Example:**
69
70
```python
71
from dscribe.descriptors import CoulombMatrix
72
from ase.build import molecule
73
74
# Setup Coulomb Matrix descriptor
75
cm = CoulombMatrix(
76
n_atoms_max=10,
77
permutation="sorted_l2"
78
)
79
80
# Create descriptor for water molecule
81
water = molecule("H2O")
82
cm_desc = cm.create(water) # Shape: (1, n_features)
83
84
# Get the actual matrix (before flattening)
85
cm_matrix = cm.get_matrix(water) # Shape: (3, 3) for H2O
86
87
# Process multiple molecules
88
molecules = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
89
cm_descriptors = cm.create(molecules) # Shape: (3, n_features)
90
```
91
92
### SineMatrix
93
94
The Sine Matrix is designed for periodic systems, using sine functions instead of Coulomb interactions to handle periodic boundary conditions more effectively.
95
96
```python { .api }
97
class SineMatrix:
98
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
99
"""
100
Initialize Sine Matrix descriptor.
101
102
Parameters:
103
- n_atoms_max (int): Maximum number of atoms in structures to be processed
104
- permutation (str): Permutation strategy for handling atom ordering:
105
- "sorted_l2": Sort rows/columns by L2 norm
106
- "eigenspectrum": Use eigenvalue spectrum
107
- "random": Random permutation with noise
108
- sigma (float): Standard deviation for random noise (only for "random" permutation)
109
- seed (int): Random seed for reproducible random permutations
110
- sparse (bool): Whether to return sparse arrays
111
"""
112
113
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False):
114
"""
115
Create Sine Matrix descriptor for given system(s).
116
117
Parameters:
118
- system: ASE Atoms object(s) or DScribe System object(s)
119
- n_jobs (int): Number of parallel processes
120
- only_physical_cores (bool): Whether to use only physical CPU cores
121
- verbose (bool): Whether to print progress information
122
123
Returns:
124
numpy.ndarray: Sine Matrix descriptors with shape (n_systems, n_features)
125
"""
126
127
def get_matrix(self, system):
128
"""
129
Get the sine matrix for a single atomic system.
130
131
Parameters:
132
- system: ASE Atoms object or DScribe System object
133
134
Returns:
135
numpy.ndarray: 2D sine matrix with shape (n_atoms, n_atoms)
136
"""
137
138
def get_number_of_features(self):
139
"""Get total number of features in flattened Sine Matrix descriptor."""
140
```
141
142
**Usage Example:**
143
144
```python
145
from dscribe.descriptors import SineMatrix
146
from ase.build import bulk
147
148
# Setup Sine Matrix descriptor for periodic systems
149
sm = SineMatrix(
150
n_atoms_max=8,
151
permutation="sorted_l2"
152
)
153
154
# Create descriptor for periodic system
155
nacl = bulk("NaCl", "rocksalt", a=5.64)
156
sm_desc = sm.create(nacl) # Shape: (1, n_features)
157
```
158
159
### EwaldSumMatrix
160
161
The Ewald Sum Matrix uses Ewald summation to compute electrostatic interactions in periodic systems, providing a more accurate treatment of long-range Coulomb interactions than the basic Coulomb Matrix.
162
163
```python { .api }
164
class EwaldSumMatrix:
165
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False):
166
"""
167
Initialize Ewald Sum Matrix descriptor.
168
169
Parameters:
170
- n_atoms_max (int): Maximum number of atoms in structures to be processed
171
- permutation (str): Permutation strategy for handling atom ordering:
172
- "sorted_l2": Sort rows/columns by L2 norm
173
- "eigenspectrum": Use eigenvalue spectrum
174
- "random": Random permutation with noise
175
- sigma (float): Standard deviation for random noise (only for "random" permutation)
176
- seed (int): Random seed for reproducible random permutations
177
- sparse (bool): Whether to return sparse arrays
178
"""
179
180
def create(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None,
181
n_jobs=1, only_physical_cores=False, verbose=False):
182
"""
183
Create Ewald Sum Matrix descriptor for given system(s).
184
185
Parameters:
186
- system: ASE Atoms object(s) or DScribe System object(s)
187
- accuracy (float): Accuracy of Ewald summation
188
- w (float): Scaling parameter for electrostatic interactions
189
- r_cut (float): Real-space cutoff radius (auto-determined if None)
190
- g_cut (float): Reciprocal-space cutoff (auto-determined if None)
191
- a (float): Ewald parameter (auto-determined if None)
192
- n_jobs (int): Number of parallel processes
193
- only_physical_cores (bool): Whether to use only physical CPU cores
194
- verbose (bool): Whether to print progress information
195
196
Returns:
197
numpy.ndarray: Ewald Sum Matrix descriptors with shape (n_systems, n_features)
198
"""
199
200
def get_matrix(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None):
201
"""
202
Get the Ewald sum matrix for a single atomic system.
203
204
Parameters:
205
- system: ASE Atoms object or DScribe System object
206
- accuracy (float): Accuracy of Ewald summation
207
- w (float): Scaling parameter for electrostatic interactions
208
- r_cut (float): Real-space cutoff radius
209
- g_cut (float): Reciprocal-space cutoff
210
- a (float): Ewald parameter
211
212
Returns:
213
numpy.ndarray: 2D Ewald sum matrix with shape (n_atoms, n_atoms)
214
"""
215
216
def get_number_of_features(self):
217
"""Get total number of features in flattened Ewald Sum Matrix descriptor."""
218
```
219
220
**Usage Example:**
221
222
```python
223
from dscribe.descriptors import EwaldSumMatrix
224
from ase.build import bulk
225
226
# Setup Ewald Sum Matrix descriptor
227
esm = EwaldSumMatrix(
228
n_atoms_max=8,
229
permutation="sorted_l2"
230
)
231
232
# Create descriptor for periodic system with custom Ewald parameters
233
nacl = bulk("NaCl", "rocksalt", a=5.64)
234
esm_desc = esm.create(nacl, accuracy=1e-6, w=0.5) # Shape: (1, n_features)
235
236
# Get the actual Ewald matrix
237
esm_matrix = esm.get_matrix(nacl, accuracy=1e-6) # Shape: (n_atoms, n_atoms)
238
```
239
240
## Matrix Descriptor Base Methods
241
242
All matrix descriptors inherit from `DescriptorMatrix` and share these methods:
243
244
```python { .api }
245
def sort(self, matrix):
246
"""
247
Sort matrix rows and columns by L2 norm.
248
249
Parameters:
250
- matrix: 2D matrix to sort
251
252
Returns:
253
numpy.ndarray: Sorted matrix
254
"""
255
256
def get_eigenspectrum(self, matrix):
257
"""
258
Get eigenvalue spectrum of matrix sorted by absolute value.
259
260
Parameters:
261
- matrix: 2D matrix
262
263
Returns:
264
numpy.ndarray: Sorted eigenvalues
265
"""
266
267
def zero_pad(self, array):
268
"""
269
Zero-pad matrix to n_atoms_max size.
270
271
Parameters:
272
- array: Matrix to pad
273
274
Returns:
275
numpy.ndarray: Zero-padded matrix
276
"""
277
```
278
279
## Permutation Strategies
280
281
Matrix descriptors handle different atom orderings through permutation strategies:
282
283
- **"sorted_l2"**: Sort matrix rows/columns by their L2 norms (most common)
284
- **"eigenspectrum"**: Use eigenvalue spectrum instead of full matrix
285
- **"random"**: Add random noise for data augmentation (requires `sigma` parameter)
286
287
## Usage Considerations
288
289
### System Size Requirements
290
291
All matrix descriptors require specifying `n_atoms_max`, which should be:
292
- At least as large as the biggest system you'll process
293
- Not too large to avoid unnecessary memory usage and computation time
294
295
### Periodic vs Non-Periodic Systems
296
297
- **CoulombMatrix**: Best for molecular systems (non-periodic)
298
- **SineMatrix**: Designed for periodic systems
299
- **EwaldSumMatrix**: Most accurate for periodic systems with long-range interactions
300
301
### Output Format
302
303
Matrix descriptors flatten the 2D matrices into 1D feature vectors:
304
- Original matrix shape: `(n_atoms, n_atoms)`
305
- Flattened output shape: `(n_atoms_max * n_atoms_max,)`
306
- For multiple systems: `(n_systems, n_atoms_max * n_atoms_max)`
307
308
Use the `unflatten()` method to recover the original 2D matrix format when needed.