0
# Utilities
1
2
DScribe provides utility functions for common operations in materials science applications, including species handling, geometry calculations, statistical analysis, and array operations. These utilities support the core descriptor functionality and provide helpful tools for preprocessing and analysis.
3
4
## Capabilities
5
6
### Species Utilities
7
8
Functions for working with atomic species, converting between symbols and atomic numbers, and handling species lists.
9
10
```python { .api }
11
def symbols_to_numbers(symbols):
12
"""
13
Convert chemical symbols to atomic numbers.
14
15
Parameters:
16
- symbols: Chemical symbols as strings, list, or array
17
18
Returns:
19
numpy.ndarray: Array of atomic numbers
20
21
Examples:
22
symbols_to_numbers("H") -> [1]
23
symbols_to_numbers(["H", "He", "Li"]) -> [1, 2, 3]
24
"""
25
26
def get_atomic_numbers(species):
27
"""
28
Get ordered atomic numbers from species list.
29
30
Parameters:
31
- species: List of atomic species (symbols or numbers)
32
33
Returns:
34
numpy.ndarray: Sorted array of unique atomic numbers
35
36
Examples:
37
get_atomic_numbers(["H", "O", "H"]) -> [1, 8]
38
get_atomic_numbers([1, 8, 1]) -> [1, 8]
39
"""
40
```
41
42
**Usage Example:**
43
44
```python
45
from dscribe.utils.species import symbols_to_numbers, get_atomic_numbers
46
47
# Convert symbols to atomic numbers
48
symbols = ["H", "H", "O"]
49
numbers = symbols_to_numbers(symbols) # [1, 1, 8]
50
51
# Get unique atomic numbers from species list
52
species_list = ["C", "H", "O", "N", "H", "C"]
53
unique_numbers = get_atomic_numbers(species_list) # [1, 6, 7, 8]
54
55
# Use with mixed input types
56
mixed_species = [1, "He", 3, "Be"]
57
sorted_numbers = get_atomic_numbers(mixed_species) # [1, 2, 3, 4]
58
```
59
60
### Geometry Utilities
61
62
Functions for geometric operations including neighbor finding, adjacency matrices, and system extensions for periodic boundary conditions.
63
64
```python { .api }
65
def get_adjacency_matrix(radius, pos1, pos2=None, output_type="coo_matrix"):
66
"""
67
Get sparse adjacency matrix for atoms within cutoff radius.
68
69
Parameters:
70
- radius (float): Cutoff radius in angstroms
71
- pos1: First set of atomic positions
72
- pos2: Second set of positions (optional, defaults to pos1)
73
- output_type (str): Output format ("coo_matrix", "dense")
74
75
Returns:
76
scipy.sparse matrix or numpy.ndarray: Adjacency matrix indicating connections
77
"""
78
79
def get_adjacency_list(adjacency_matrix):
80
"""
81
Convert adjacency matrix to list format.
82
83
Parameters:
84
- adjacency_matrix: Sparse or dense adjacency matrix
85
86
Returns:
87
list: List of neighbor lists for each atom
88
"""
89
90
def get_extended_system(system, radial_cutoff, centers=None, return_cell_indices=False):
91
"""
92
Extend periodic system with neighboring unit cells to ensure complete
93
local environments within the cutoff radius.
94
95
Parameters:
96
- system: ASE Atoms or DScribe System object
97
- radial_cutoff (float): Cutoff radius for local environments
98
- centers: Atom indices to center the extension around (optional)
99
- return_cell_indices (bool): Whether to return cell indices for extended atoms
100
101
Returns:
102
System or tuple: Extended system, optionally with cell indices
103
"""
104
```
105
106
**Usage Example:**
107
108
```python
109
from dscribe.utils.geometry import get_adjacency_matrix, get_extended_system
110
from ase.build import bulk
111
import numpy as np
112
113
# Create a periodic system
114
nacl = bulk("NaCl", "rocksalt", a=5.64)
115
116
# Get adjacency matrix for neighbors within 3 Å
117
positions = nacl.get_positions()
118
adjacency = get_adjacency_matrix(3.0, positions)
119
print(f"Adjacency matrix shape: {adjacency.shape}")
120
print(f"Number of connections: {adjacency.nnz}")
121
122
# Extend system for descriptor calculations
123
extended_system = get_extended_system(nacl, radial_cutoff=6.0)
124
print(f"Original atoms: {len(nacl)}")
125
print(f"Extended atoms: {len(extended_system)}")
126
127
# Get cell indices for tracking extended atoms
128
extended_system, cell_indices = get_extended_system(
129
nacl, radial_cutoff=6.0, return_cell_indices=True
130
)
131
```
132
133
### Statistics Utilities
134
135
Functions for gathering statistics from collections of atomic systems, useful for descriptor setup and data analysis.
136
137
```python { .api }
138
def system_stats(system_iterator):
139
"""
140
Gather statistics from multiple atomic systems.
141
142
Parameters:
143
- system_iterator: Iterable of ASE Atoms or DScribe System objects
144
145
Returns:
146
dict: Statistics dictionary containing:
147
- n_atoms_max: Maximum number of atoms in any system
148
- max_atomic_number: Highest atomic number present
149
- min_atomic_number: Lowest atomic number present
150
- atomic_numbers: Set of all atomic numbers present
151
- element_symbols: Set of all element symbols present
152
- min_distance: Minimum interatomic distance found
153
"""
154
```
155
156
**Usage Example:**
157
158
```python
159
from dscribe.utils.stats import system_stats
160
from ase.build import molecule, bulk
161
162
# Collect various atomic systems
163
systems = [
164
molecule("H2O"),
165
molecule("NH3"),
166
molecule("CH4"),
167
bulk("NaCl", "rocksalt", a=5.64)
168
]
169
170
# Gather statistics
171
stats = system_stats(systems)
172
173
print(f"Maximum atoms in any system: {stats['n_atoms_max']}")
174
print(f"Atomic numbers present: {stats['atomic_numbers']}")
175
print(f"Element symbols: {stats['element_symbols']}")
176
print(f"Atomic number range: {stats['min_atomic_number']}-{stats['max_atomic_number']}")
177
print(f"Minimum distance: {stats['min_distance']:.3f} Å")
178
179
# Use statistics for descriptor setup
180
from dscribe.descriptors import CoulombMatrix
181
cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])
182
```
183
184
### Dimensionality Utilities
185
186
Functions for checking array dimensions and data types, useful for input validation and preprocessing.
187
188
```python { .api }
189
def is1d(array, dtype=None):
190
"""
191
Check if array is 1D with optional dtype verification.
192
193
Parameters:
194
- array: Array to check
195
- dtype: Expected data type (e.g., np.integer, np.floating)
196
197
Returns:
198
bool: True if array is 1D and matches dtype (if specified)
199
"""
200
201
def is2d(array, dtype=None):
202
"""
203
Check if array is 2D with optional dtype verification.
204
205
Parameters:
206
- array: Array to check
207
- dtype: Expected data type (e.g., np.integer, np.floating)
208
209
Returns:
210
bool: True if array is 2D and matches dtype (if specified)
211
"""
212
```
213
214
**Usage Example:**
215
216
```python
217
from dscribe.utils.dimensionality import is1d, is2d
218
import numpy as np
219
220
# Test arrays
221
positions = np.random.rand(10, 3) # 2D array
222
distances = np.random.rand(10) # 1D array
223
indices = np.array([1, 2, 3, 4]) # 1D integer array
224
225
# Check dimensions
226
print(f"Positions is 2D: {is2d(positions)}")
227
print(f"Distances is 1D: {is1d(distances)}")
228
print(f"Indices is 1D integer: {is1d(indices, np.integer)}")
229
print(f"Positions is 1D: {is1d(positions)}") # False
230
231
# Use for input validation
232
def validate_positions(pos):
233
if not is2d(pos, np.floating):
234
raise ValueError("Positions must be a 2D floating-point array")
235
if pos.shape[1] != 3:
236
raise ValueError("Positions must have 3 columns (x, y, z)")
237
return True
238
239
# Validation example
240
try:
241
validate_positions(positions) # Pass
242
validate_positions(distances) # Fail - not 2D
243
except ValueError as e:
244
print(f"Validation failed: {e}")
245
```
246
247
## Common Utility Patterns
248
249
### Preprocessing Workflows
250
251
```python
252
from dscribe.utils.stats import system_stats
253
from dscribe.utils.species import get_atomic_numbers
254
from ase.build import molecule
255
256
# Step 1: Gather system statistics
257
systems = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
258
stats = system_stats(systems)
259
260
# Step 2: Set up species list
261
species = list(stats['element_symbols'])
262
atomic_nums = get_atomic_numbers(species)
263
264
# Step 3: Configure descriptors with statistics
265
from dscribe.descriptors import SOAP, CoulombMatrix
266
267
soap = SOAP(
268
species=species,
269
r_cut=6.0,
270
n_max=8,
271
l_max=6
272
)
273
274
cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])
275
```
276
277
### Periodic System Handling
278
279
```python
280
from dscribe.utils.geometry import get_extended_system
281
from dscribe.descriptors import SOAP
282
from ase.build import bulk
283
284
# Create periodic system
285
crystal = bulk("Si", "diamond", a=5.43)
286
287
# Extend for local environment calculations
288
r_cut = 6.0
289
extended = get_extended_system(crystal, r_cut)
290
291
# Use extended system with descriptors
292
soap = SOAP(species=["Si"], r_cut=r_cut, n_max=8, l_max=6)
293
descriptors = soap.create(extended)
294
```
295
296
### Data Validation
297
298
```python
299
from dscribe.utils.dimensionality import is1d, is2d
300
import numpy as np
301
302
def validate_descriptor_input(systems, centers=None):
303
"""Validate input for descriptor calculations."""
304
305
# Check systems is iterable
306
try:
307
iter(systems)
308
except TypeError:
309
systems = [systems] # Single system
310
311
# Validate centers if provided
312
if centers is not None:
313
if not is1d(centers, np.integer):
314
raise ValueError("Centers must be 1D integer array")
315
316
if np.any(centers < 0):
317
raise ValueError("Centers indices must be non-negative")
318
319
return systems, centers
320
321
# Usage in descriptor code
322
systems, centers = validate_descriptor_input(my_systems, my_centers)
323
```
324
325
## Integration with Descriptors
326
327
These utilities are used internally by DScribe descriptors but are also available for user applications:
328
329
- **Species utilities**: Used by all descriptors for species validation and processing
330
- **Geometry utilities**: Used by local descriptors for neighbor finding and system extension
331
- **Statistics utilities**: Helpful for setting up descriptor parameters across datasets
332
- **Dimensionality utilities**: Used for input validation throughout the package
333
334
The utilities provide building blocks for custom analysis workflows and help ensure consistent data handling across different parts of the DScribe ecosystem.