Tessl Tile for pypi/dscribe@2.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-classes.md global-descriptors.md index.md kernels.md local-descriptors.md matrix-descriptors.md utilities.md

utilities.mddocs/

0
# Utilities
1

2
DScribe provides utility functions for common operations in materials science applications, including species handling, geometry calculations, statistical analysis, and array operations. These utilities support the core descriptor functionality and provide helpful tools for preprocessing and analysis.
3

4
## Capabilities
5

6
### Species Utilities
7

8
Functions for working with atomic species, converting between symbols and atomic numbers, and handling species lists.
9

10
```python { .api }
11
def symbols_to_numbers(symbols):
12
    """
13
    Convert chemical symbols to atomic numbers.
14
    
15
    Parameters:
16
    - symbols: Chemical symbols as strings, list, or array
17
    
18
    Returns:
19
    numpy.ndarray: Array of atomic numbers
20
    
21
    Examples:
22
    symbols_to_numbers("H") -> [1]
23
    symbols_to_numbers(["H", "He", "Li"]) -> [1, 2, 3]
24
    """
25

26
def get_atomic_numbers(species):
27
    """
28
    Get ordered atomic numbers from species list.
29
    
30
    Parameters:
31
    - species: List of atomic species (symbols or numbers)
32
    
33
    Returns:
34
    numpy.ndarray: Sorted array of unique atomic numbers
35
    
36
    Examples:
37
    get_atomic_numbers(["H", "O", "H"]) -> [1, 8]
38
    get_atomic_numbers([1, 8, 1]) -> [1, 8]
39
    """
40
```
41

42
**Usage Example:**
43

44
```python
45
from dscribe.utils.species import symbols_to_numbers, get_atomic_numbers
46

47
# Convert symbols to atomic numbers
48
symbols = ["H", "H", "O"]
49
numbers = symbols_to_numbers(symbols)  # [1, 1, 8]
50

51
# Get unique atomic numbers from species list
52
species_list = ["C", "H", "O", "N", "H", "C"]
53
unique_numbers = get_atomic_numbers(species_list)  # [1, 6, 7, 8]
54

55
# Use with mixed input types
56
mixed_species = [1, "He", 3, "Be"]
57
sorted_numbers = get_atomic_numbers(mixed_species)  # [1, 2, 3, 4]
58
```
59

60
### Geometry Utilities
61

62
Functions for geometric operations including neighbor finding, adjacency matrices, and system extensions for periodic boundary conditions.
63

64
```python { .api }
65
def get_adjacency_matrix(radius, pos1, pos2=None, output_type="coo_matrix"):
66
    """
67
    Get sparse adjacency matrix for atoms within cutoff radius.
68
    
69
    Parameters:
70
    - radius (float): Cutoff radius in angstroms
71
    - pos1: First set of atomic positions
72
    - pos2: Second set of positions (optional, defaults to pos1)
73
    - output_type (str): Output format ("coo_matrix", "dense")
74
    
75
    Returns:
76
    scipy.sparse matrix or numpy.ndarray: Adjacency matrix indicating connections
77
    """
78

79
def get_adjacency_list(adjacency_matrix):
80
    """
81
    Convert adjacency matrix to list format.
82
    
83
    Parameters:
84
    - adjacency_matrix: Sparse or dense adjacency matrix
85
    
86
    Returns:
87
    list: List of neighbor lists for each atom
88
    """
89

90
def get_extended_system(system, radial_cutoff, centers=None, return_cell_indices=False):
91
    """
92
    Extend periodic system with neighboring unit cells to ensure complete
93
    local environments within the cutoff radius.
94
    
95
    Parameters:
96
    - system: ASE Atoms or DScribe System object
97
    - radial_cutoff (float): Cutoff radius for local environments
98
    - centers: Atom indices to center the extension around (optional)
99
    - return_cell_indices (bool): Whether to return cell indices for extended atoms
100
    
101
    Returns:
102
    System or tuple: Extended system, optionally with cell indices
103
    """
104
```
105

106
**Usage Example:**
107

108
```python
109
from dscribe.utils.geometry import get_adjacency_matrix, get_extended_system
110
from ase.build import bulk
111
import numpy as np
112

113
# Create a periodic system
114
nacl = bulk("NaCl", "rocksalt", a=5.64)
115

116
# Get adjacency matrix for neighbors within 3 Å
117
positions = nacl.get_positions()
118
adjacency = get_adjacency_matrix(3.0, positions)
119
print(f"Adjacency matrix shape: {adjacency.shape}")
120
print(f"Number of connections: {adjacency.nnz}")
121

122
# Extend system for descriptor calculations
123
extended_system = get_extended_system(nacl, radial_cutoff=6.0)
124
print(f"Original atoms: {len(nacl)}")
125
print(f"Extended atoms: {len(extended_system)}")
126

127
# Get cell indices for tracking extended atoms
128
extended_system, cell_indices = get_extended_system(
129
    nacl, radial_cutoff=6.0, return_cell_indices=True
130
)
131
```
132

133
### Statistics Utilities
134

135
Functions for gathering statistics from collections of atomic systems, useful for descriptor setup and data analysis.
136

137
```python { .api }
138
def system_stats(system_iterator):
139
    """
140
    Gather statistics from multiple atomic systems.
141
    
142
    Parameters:
143
    - system_iterator: Iterable of ASE Atoms or DScribe System objects
144
    
145
    Returns:
146
    dict: Statistics dictionary containing:
147
        - n_atoms_max: Maximum number of atoms in any system
148
        - max_atomic_number: Highest atomic number present
149
        - min_atomic_number: Lowest atomic number present  
150
        - atomic_numbers: Set of all atomic numbers present
151
        - element_symbols: Set of all element symbols present
152
        - min_distance: Minimum interatomic distance found
153
    """
154
```
155

156
**Usage Example:**
157

158
```python
159
from dscribe.utils.stats import system_stats
160
from ase.build import molecule, bulk
161

162
# Collect various atomic systems
163
systems = [
164
    molecule("H2O"),
165
    molecule("NH3"), 
166
    molecule("CH4"),
167
    bulk("NaCl", "rocksalt", a=5.64)
168
]
169

170
# Gather statistics
171
stats = system_stats(systems)
172

173
print(f"Maximum atoms in any system: {stats['n_atoms_max']}")
174
print(f"Atomic numbers present: {stats['atomic_numbers']}")
175
print(f"Element symbols: {stats['element_symbols']}")
176
print(f"Atomic number range: {stats['min_atomic_number']}-{stats['max_atomic_number']}")
177
print(f"Minimum distance: {stats['min_distance']:.3f} Å")
178

179
# Use statistics for descriptor setup
180
from dscribe.descriptors import CoulombMatrix
181
cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])
182
```
183

184
### Dimensionality Utilities
185

186
Functions for checking array dimensions and data types, useful for input validation and preprocessing.
187

188
```python { .api }
189
def is1d(array, dtype=None):
190
    """
191
    Check if array is 1D with optional dtype verification.
192
    
193
    Parameters:
194
    - array: Array to check
195
    - dtype: Expected data type (e.g., np.integer, np.floating)
196
    
197
    Returns:
198
    bool: True if array is 1D and matches dtype (if specified)
199
    """
200

201
def is2d(array, dtype=None):
202
    """
203
    Check if array is 2D with optional dtype verification.
204
    
205
    Parameters:
206
    - array: Array to check  
207
    - dtype: Expected data type (e.g., np.integer, np.floating)
208
    
209
    Returns:
210
    bool: True if array is 2D and matches dtype (if specified)
211
    """
212
```
213

214
**Usage Example:**
215

216
```python
217
from dscribe.utils.dimensionality import is1d, is2d
218
import numpy as np
219

220
# Test arrays
221
positions = np.random.rand(10, 3)  # 2D array
222
distances = np.random.rand(10)     # 1D array
223
indices = np.array([1, 2, 3, 4])   # 1D integer array
224

225
# Check dimensions
226
print(f"Positions is 2D: {is2d(positions)}")
227
print(f"Distances is 1D: {is1d(distances)}")
228
print(f"Indices is 1D integer: {is1d(indices, np.integer)}")
229
print(f"Positions is 1D: {is1d(positions)}")  # False
230

231
# Use for input validation
232
def validate_positions(pos):
233
    if not is2d(pos, np.floating):
234
        raise ValueError("Positions must be a 2D floating-point array")
235
    if pos.shape[1] != 3:
236
        raise ValueError("Positions must have 3 columns (x, y, z)")
237
    return True
238

239
# Validation example
240
try:
241
    validate_positions(positions)  # Pass
242
    validate_positions(distances)  # Fail - not 2D
243
except ValueError as e:
244
    print(f"Validation failed: {e}")
245
```
246

247
## Common Utility Patterns
248

249
### Preprocessing Workflows
250

251
```python
252
from dscribe.utils.stats import system_stats
253
from dscribe.utils.species import get_atomic_numbers
254
from ase.build import molecule
255

256
# Step 1: Gather system statistics  
257
systems = [molecule("H2O"), molecule("NH3"), molecule("CH4")]
258
stats = system_stats(systems)
259

260
# Step 2: Set up species list
261
species = list(stats['element_symbols'])
262
atomic_nums = get_atomic_numbers(species)
263

264
# Step 3: Configure descriptors with statistics
265
from dscribe.descriptors import SOAP, CoulombMatrix
266

267
soap = SOAP(
268
    species=species,
269
    r_cut=6.0,
270
    n_max=8,
271
    l_max=6
272
)
273

274
cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])
275
```
276

277
### Periodic System Handling
278

279
```python
280
from dscribe.utils.geometry import get_extended_system
281
from dscribe.descriptors import SOAP
282
from ase.build import bulk
283

284
# Create periodic system
285
crystal = bulk("Si", "diamond", a=5.43)
286

287
# Extend for local environment calculations
288
r_cut = 6.0
289
extended = get_extended_system(crystal, r_cut)
290

291
# Use extended system with descriptors
292
soap = SOAP(species=["Si"], r_cut=r_cut, n_max=8, l_max=6)
293
descriptors = soap.create(extended)
294
```
295

296
### Data Validation
297

298
```python
299
from dscribe.utils.dimensionality import is1d, is2d
300
import numpy as np
301

302
def validate_descriptor_input(systems, centers=None):
303
    """Validate input for descriptor calculations."""
304
    
305
    # Check systems is iterable
306
    try:
307
        iter(systems)
308
    except TypeError:
309
        systems = [systems]  # Single system
310
    
311
    # Validate centers if provided
312
    if centers is not None:
313
        if not is1d(centers, np.integer):
314
            raise ValueError("Centers must be 1D integer array")
315
        
316
        if np.any(centers < 0):
317
            raise ValueError("Centers indices must be non-negative")
318
    
319
    return systems, centers
320

321
# Usage in descriptor code
322
systems, centers = validate_descriptor_input(my_systems, my_centers)
323
```
324

325
## Integration with Descriptors
326

327
These utilities are used internally by DScribe descriptors but are also available for user applications:
328

329
- **Species utilities**: Used by all descriptors for species validation and processing
330
- **Geometry utilities**: Used by local descriptors for neighbor finding and system extension
331
- **Statistics utilities**: Helpful for setting up descriptor parameters across datasets
332
- **Dimensionality utilities**: Used for input validation throughout the package
333

334
The utilities provide building blocks for custom analysis workflows and help ensure consistent data handling across different parts of the DScribe ecosystem.

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/