0
# DScribe
1
2
DScribe is a comprehensive Python library for transforming atomic structures into fixed-size numerical fingerprints (descriptors) used in machine learning applications for materials science. The package provides implementations of various descriptor methods including Coulomb Matrix, Sine Matrix, Ewald Matrix, Atom-centered Symmetry Functions (ACSF), Smooth Overlap of Atomic Positions (SOAP), Many-body Tensor Representation (MBTR), Local Many-body Tensor Representation (LMBTR), and Valle-Oganov descriptor. All descriptors support both spectrum generation and derivative calculations with respect to atomic positions.
3
4
## Package Information
5
6
- **Package Name**: dscribe
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install dscribe` or `conda install -c conda-forge dscribe`
10
11
## Core Imports
12
13
```python
14
import dscribe
15
from dscribe import System
16
```
17
18
For descriptors:
19
20
```python
21
from dscribe.descriptors import SOAP, ACSF, MBTR, CoulombMatrix, SineMatrix, EwaldSumMatrix, LMBTR, ValleOganov
22
```
23
24
For core classes:
25
26
```python
27
from dscribe.core import System, Lattice
28
```
29
30
For kernels:
31
32
```python
33
from dscribe.kernels import AverageKernel, REMatchKernel
34
```
35
36
For utilities:
37
38
```python
39
from dscribe.utils.geometry import get_adjacency_matrix, get_extended_system
40
from dscribe.utils.species import symbols_to_numbers, get_atomic_numbers
41
from dscribe.utils.stats import system_stats
42
from dscribe.utils.dimensionality import is1d, is2d
43
```
44
45
## Basic Usage
46
47
```python
48
import numpy as np
49
from ase.build import molecule
50
from dscribe.descriptors import SOAP, CoulombMatrix
51
from dscribe import System
52
53
# Define atomic structures using ASE
54
samples = [molecule("H2O"), molecule("NO2"), molecule("CO2")]
55
56
# Or create DScribe System objects (extends ASE Atoms with caching)
57
water_system = System.from_atoms(molecule("H2O"))
58
59
# Setup descriptors
60
cm_desc = CoulombMatrix(n_atoms_max=3, permutation="sorted_l2")
61
soap_desc = SOAP(species=["C", "H", "O", "N"], r_cut=5.0, n_max=8, l_max=6)
62
63
# Create descriptors as numpy arrays
64
water = samples[0]
65
coulomb_matrix = cm_desc.create(water)
66
soap = soap_desc.create(water, centers=[0]) # SOAP for atom at index 0
67
68
# Process multiple systems with optional parallelization
69
coulomb_matrices = cm_desc.create(samples, n_jobs=3)
70
oxygen_indices = [np.where(x.get_atomic_numbers() == 8)[0] for x in samples]
71
oxygen_soap = soap_desc.create(samples, centers=oxygen_indices, n_jobs=3)
72
73
# Calculate derivatives with respect to atomic positions
74
derivatives, descriptors = soap_desc.derivatives(water, return_descriptor=True)
75
```
76
77
## Architecture
78
79
DScribe uses a hierarchical descriptor architecture:
80
81
- **Core Classes**: `System` (extended ASE Atoms with caching) and `Lattice` (unit cell representation)
82
- **Descriptor Base Classes**: Abstract base classes defining the descriptor interface
83
- `Descriptor`: Base class for all descriptors
84
- `DescriptorLocal`: Base for per-atom descriptors (SOAP, ACSF, LMBTR)
85
- `DescriptorGlobal`: Base for per-structure descriptors (MBTR, ValleOganov)
86
- `DescriptorMatrix`: Base for matrix descriptors (CoulombMatrix, SineMatrix, EwaldSumMatrix)
87
- **Kernels**: Similarity measures using local environment comparisons
88
- **Utilities**: Helper functions for geometry, species handling, and statistics
89
90
This design enables consistent interfaces across different descriptor types while supporting both local (per-atom) and global (per-structure) feature representations, parallel processing, and derivative calculations for machine learning applications in materials science.
91
92
## Capabilities
93
94
### Local Descriptors
95
96
Local descriptors compute features for individual atoms or local atomic environments, producing per-atom feature vectors that can be averaged or processed separately.
97
98
```python { .api }
99
class SOAP:
100
def __init__(self, r_cut, n_max, l_max, sigma=1.0, rbf="gto",
101
weighting=None, average="off", compression={"mode": "off", "species_weighting": None},
102
species=None, periodic=False, sparse=False, dtype="float64"): ...
103
def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...
104
def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...
105
106
class ACSF:
107
def __init__(self, r_cut, g2_params=None, g3_params=None, g4_params=None, g5_params=None,
108
species=None, periodic=False, sparse=False, dtype="float64"): ...
109
def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...
110
def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...
111
112
class LMBTR:
113
def __init__(self, geometry=None, grid=None, weighting=None, normalize_gaussians=True,
114
normalization="none", species=None, periodic=False, sparse=False, dtype="float64"): ...
115
def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...
116
def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...
117
```
118
119
[Local Descriptors](./local-descriptors.md)
120
121
### Global Descriptors
122
123
Global descriptors compute features for entire atomic structures, producing a single feature vector per structure that captures overall structural properties.
124
125
```python { .api }
126
class MBTR:
127
def __init__(self, geometry=None, grid=None, weighting=None, normalize_gaussians=True,
128
normalization="none", species=None, periodic=False, sparse=False, dtype="float64"): ...
129
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...
130
def derivatives(self, system, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...
131
132
class ValleOganov:
133
def __init__(self, species, function, n, sigma, r_cut, sparse=False, dtype="float64"): ...
134
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...
135
```
136
137
[Global Descriptors](./global-descriptors.md)
138
139
### Matrix Descriptors
140
141
Matrix descriptors represent atomic structures as matrices based on pairwise interactions, then flatten or transform these matrices into fixed-size feature vectors.
142
143
```python { .api }
144
class CoulombMatrix:
145
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...
146
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...
147
def get_matrix(self, system): ...
148
149
class SineMatrix:
150
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...
151
def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...
152
def get_matrix(self, system): ...
153
154
class EwaldSumMatrix:
155
def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...
156
def create(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None, n_jobs=1, only_physical_cores=False, verbose=False): ...
157
def get_matrix(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None): ...
158
```
159
160
[Matrix Descriptors](./matrix-descriptors.md)
161
162
### Core Classes
163
164
Core classes provide the foundation for representing atomic systems and lattices with enhanced functionality beyond the standard ASE library.
165
166
```python { .api }
167
class System:
168
def __init__(self, symbols=None, positions=None, numbers=None, cell=None, pbc=None, **kwargs): ...
169
@staticmethod
170
def from_atoms(atoms): ...
171
def get_distance_matrix(self): ...
172
def get_distance_matrix_within_radius(self, radius, pos=None, output_type="coo_matrix"): ...
173
def to_scaled(self, positions, wrap=False): ...
174
def to_cartesian(self, scaled_positions, wrap=False): ...
175
176
class Lattice:
177
def __init__(self, matrix): ...
178
@property
179
def matrix(self): ...
180
@property
181
def lengths(self): ...
182
@property
183
def abc(self): ...
184
def get_cartesian_coords(self, fractional_coords): ...
185
def get_fractional_coords(self, cart_coords): ...
186
```
187
188
[Core Classes](./core-classes.md)
189
190
### Kernels
191
192
Kernel methods for measuring similarity between atomic structures based on local atomic environment comparisons using various similarity metrics.
193
194
```python { .api }
195
class AverageKernel:
196
def __init__(self, metric, gamma=None, degree=3, coef0=1,
197
kernel_params=None, normalize_kernel=True): ...
198
def create(self, x, y=None): ...
199
200
class REMatchKernel:
201
def __init__(self, alpha=0.1, threshold=1e-6, metric="linear", gamma=None,
202
degree=3, coef0=1, kernel_params=None, normalize_kernel=True): ...
203
def create(self, x, y=None): ...
204
```
205
206
[Kernels](./kernels.md)
207
208
### Utilities
209
210
Utility functions for working with atomic species, geometry calculations, statistics, and array operations commonly needed in materials science applications.
211
212
```python { .api }
213
# Species utilities (from dscribe.utils.species)
214
def symbols_to_numbers(symbols): ...
215
def get_atomic_numbers(species): ...
216
217
# Geometry utilities (from dscribe.utils.geometry)
218
def get_adjacency_matrix(radius, pos1, pos2=None, output_type="coo_matrix"): ...
219
def get_adjacency_list(adjacency_matrix): ...
220
def get_extended_system(system, radial_cutoff, centers=None, return_cell_indices=False): ...
221
222
# Statistics utilities (from dscribe.utils.stats)
223
def system_stats(system_iterator): ...
224
225
# Dimensionality utilities (from dscribe.utils.dimensionality)
226
def is1d(array, dtype=None): ...
227
def is2d(array, dtype=None): ...
228
```
229
230
[Utilities](./utilities.md)
231
232
## Common Descriptor Interface
233
234
All descriptor classes implement these standard methods:
235
236
- `create(system, ...)` - Create descriptor for given system(s), returns numpy array or sparse matrix
237
- `get_number_of_features()` - Get total number of features in the descriptor output
238
- `derivatives(...)` - Calculate derivatives with respect to atomic positions (where supported)
239
240
## Common Parameters
241
242
Most descriptors accept these parameters:
243
244
- `system` - ASE Atoms object(s) or DScribe System object(s) to process
245
- `species` - List of atomic species to include in the descriptor
246
- `periodic` - Whether to consider periodic boundary conditions
247
- `sparse` - Whether to return sparse arrays for memory efficiency
248
- `dtype` - Data type for arrays ("float64", "float32")
249
- `n_jobs` - Number of parallel processes for computation
250
- `verbose` - Whether to print progress information during computation