or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-classes.mdglobal-descriptors.mdindex.mdkernels.mdlocal-descriptors.mdmatrix-descriptors.mdutilities.md

utilities.mddocs/

0

# Utilities

1

2

DScribe provides utility functions for common operations in materials science applications, including species handling, geometry calculations, statistical analysis, and array operations. These utilities support the core descriptor functionality and provide helpful tools for preprocessing and analysis.

3

4

## Capabilities

5

6

### Species Utilities

7

8

Functions for working with atomic species, converting between symbols and atomic numbers, and handling species lists.

9

10

```python { .api }

11

def symbols_to_numbers(symbols):

12

"""

13

Convert chemical symbols to atomic numbers.

14

15

Parameters:

16

- symbols: Chemical symbols as strings, list, or array

17

18

Returns:

19

numpy.ndarray: Array of atomic numbers

20

21

Examples:

22

symbols_to_numbers("H") -> [1]

23

symbols_to_numbers(["H", "He", "Li"]) -> [1, 2, 3]

24

"""

25

26

def get_atomic_numbers(species):

27

"""

28

Get ordered atomic numbers from species list.

29

30

Parameters:

31

- species: List of atomic species (symbols or numbers)

32

33

Returns:

34

numpy.ndarray: Sorted array of unique atomic numbers

35

36

Examples:

37

get_atomic_numbers(["H", "O", "H"]) -> [1, 8]

38

get_atomic_numbers([1, 8, 1]) -> [1, 8]

39

"""

40

```

41

42

**Usage Example:**

43

44

```python

45

from dscribe.utils.species import symbols_to_numbers, get_atomic_numbers

46

47

# Convert symbols to atomic numbers

48

symbols = ["H", "H", "O"]

49

numbers = symbols_to_numbers(symbols) # [1, 1, 8]

50

51

# Get unique atomic numbers from species list

52

species_list = ["C", "H", "O", "N", "H", "C"]

53

unique_numbers = get_atomic_numbers(species_list) # [1, 6, 7, 8]

54

55

# Use with mixed input types

56

mixed_species = [1, "He", 3, "Be"]

57

sorted_numbers = get_atomic_numbers(mixed_species) # [1, 2, 3, 4]

58

```

59

60

### Geometry Utilities

61

62

Functions for geometric operations including neighbor finding, adjacency matrices, and system extensions for periodic boundary conditions.

63

64

```python { .api }

65

def get_adjacency_matrix(radius, pos1, pos2=None, output_type="coo_matrix"):

66

"""

67

Get sparse adjacency matrix for atoms within cutoff radius.

68

69

Parameters:

70

- radius (float): Cutoff radius in angstroms

71

- pos1: First set of atomic positions

72

- pos2: Second set of positions (optional, defaults to pos1)

73

- output_type (str): Output format ("coo_matrix", "dense")

74

75

Returns:

76

scipy.sparse matrix or numpy.ndarray: Adjacency matrix indicating connections

77

"""

78

79

def get_adjacency_list(adjacency_matrix):

80

"""

81

Convert adjacency matrix to list format.

82

83

Parameters:

84

- adjacency_matrix: Sparse or dense adjacency matrix

85

86

Returns:

87

list: List of neighbor lists for each atom

88

"""

89

90

def get_extended_system(system, radial_cutoff, centers=None, return_cell_indices=False):

91

"""

92

Extend periodic system with neighboring unit cells to ensure complete

93

local environments within the cutoff radius.

94

95

Parameters:

96

- system: ASE Atoms or DScribe System object

97

- radial_cutoff (float): Cutoff radius for local environments

98

- centers: Atom indices to center the extension around (optional)

99

- return_cell_indices (bool): Whether to return cell indices for extended atoms

100

101

Returns:

102

System or tuple: Extended system, optionally with cell indices

103

"""

104

```

105

106

**Usage Example:**

107

108

```python

109

from dscribe.utils.geometry import get_adjacency_matrix, get_extended_system

110

from ase.build import bulk

111

import numpy as np

112

113

# Create a periodic system

114

nacl = bulk("NaCl", "rocksalt", a=5.64)

115

116

# Get adjacency matrix for neighbors within 3 Å

117

positions = nacl.get_positions()

118

adjacency = get_adjacency_matrix(3.0, positions)

119

print(f"Adjacency matrix shape: {adjacency.shape}")

120

print(f"Number of connections: {adjacency.nnz}")

121

122

# Extend system for descriptor calculations

123

extended_system = get_extended_system(nacl, radial_cutoff=6.0)

124

print(f"Original atoms: {len(nacl)}")

125

print(f"Extended atoms: {len(extended_system)}")

126

127

# Get cell indices for tracking extended atoms

128

extended_system, cell_indices = get_extended_system(

129

nacl, radial_cutoff=6.0, return_cell_indices=True

130

)

131

```

132

133

### Statistics Utilities

134

135

Functions for gathering statistics from collections of atomic systems, useful for descriptor setup and data analysis.

136

137

```python { .api }

138

def system_stats(system_iterator):

139

"""

140

Gather statistics from multiple atomic systems.

141

142

Parameters:

143

- system_iterator: Iterable of ASE Atoms or DScribe System objects

144

145

Returns:

146

dict: Statistics dictionary containing:

147

- n_atoms_max: Maximum number of atoms in any system

148

- max_atomic_number: Highest atomic number present

149

- min_atomic_number: Lowest atomic number present

150

- atomic_numbers: Set of all atomic numbers present

151

- element_symbols: Set of all element symbols present

152

- min_distance: Minimum interatomic distance found

153

"""

154

```

155

156

**Usage Example:**

157

158

```python

159

from dscribe.utils.stats import system_stats

160

from ase.build import molecule, bulk

161

162

# Collect various atomic systems

163

systems = [

164

molecule("H2O"),

165

molecule("NH3"),

166

molecule("CH4"),

167

bulk("NaCl", "rocksalt", a=5.64)

168

]

169

170

# Gather statistics

171

stats = system_stats(systems)

172

173

print(f"Maximum atoms in any system: {stats['n_atoms_max']}")

174

print(f"Atomic numbers present: {stats['atomic_numbers']}")

175

print(f"Element symbols: {stats['element_symbols']}")

176

print(f"Atomic number range: {stats['min_atomic_number']}-{stats['max_atomic_number']}")

177

print(f"Minimum distance: {stats['min_distance']:.3f} Å")

178

179

# Use statistics for descriptor setup

180

from dscribe.descriptors import CoulombMatrix

181

cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])

182

```

183

184

### Dimensionality Utilities

185

186

Functions for checking array dimensions and data types, useful for input validation and preprocessing.

187

188

```python { .api }

189

def is1d(array, dtype=None):

190

"""

191

Check if array is 1D with optional dtype verification.

192

193

Parameters:

194

- array: Array to check

195

- dtype: Expected data type (e.g., np.integer, np.floating)

196

197

Returns:

198

bool: True if array is 1D and matches dtype (if specified)

199

"""

200

201

def is2d(array, dtype=None):

202

"""

203

Check if array is 2D with optional dtype verification.

204

205

Parameters:

206

- array: Array to check

207

- dtype: Expected data type (e.g., np.integer, np.floating)

208

209

Returns:

210

bool: True if array is 2D and matches dtype (if specified)

211

"""

212

```

213

214

**Usage Example:**

215

216

```python

217

from dscribe.utils.dimensionality import is1d, is2d

218

import numpy as np

219

220

# Test arrays

221

positions = np.random.rand(10, 3) # 2D array

222

distances = np.random.rand(10) # 1D array

223

indices = np.array([1, 2, 3, 4]) # 1D integer array

224

225

# Check dimensions

226

print(f"Positions is 2D: {is2d(positions)}")

227

print(f"Distances is 1D: {is1d(distances)}")

228

print(f"Indices is 1D integer: {is1d(indices, np.integer)}")

229

print(f"Positions is 1D: {is1d(positions)}") # False

230

231

# Use for input validation

232

def validate_positions(pos):

233

if not is2d(pos, np.floating):

234

raise ValueError("Positions must be a 2D floating-point array")

235

if pos.shape[1] != 3:

236

raise ValueError("Positions must have 3 columns (x, y, z)")

237

return True

238

239

# Validation example

240

try:

241

validate_positions(positions) # Pass

242

validate_positions(distances) # Fail - not 2D

243

except ValueError as e:

244

print(f"Validation failed: {e}")

245

```

246

247

## Common Utility Patterns

248

249

### Preprocessing Workflows

250

251

```python

252

from dscribe.utils.stats import system_stats

253

from dscribe.utils.species import get_atomic_numbers

254

from ase.build import molecule

255

256

# Step 1: Gather system statistics

257

systems = [molecule("H2O"), molecule("NH3"), molecule("CH4")]

258

stats = system_stats(systems)

259

260

# Step 2: Set up species list

261

species = list(stats['element_symbols'])

262

atomic_nums = get_atomic_numbers(species)

263

264

# Step 3: Configure descriptors with statistics

265

from dscribe.descriptors import SOAP, CoulombMatrix

266

267

soap = SOAP(

268

species=species,

269

r_cut=6.0,

270

n_max=8,

271

l_max=6

272

)

273

274

cm = CoulombMatrix(n_atoms_max=stats['n_atoms_max'])

275

```

276

277

### Periodic System Handling

278

279

```python

280

from dscribe.utils.geometry import get_extended_system

281

from dscribe.descriptors import SOAP

282

from ase.build import bulk

283

284

# Create periodic system

285

crystal = bulk("Si", "diamond", a=5.43)

286

287

# Extend for local environment calculations

288

r_cut = 6.0

289

extended = get_extended_system(crystal, r_cut)

290

291

# Use extended system with descriptors

292

soap = SOAP(species=["Si"], r_cut=r_cut, n_max=8, l_max=6)

293

descriptors = soap.create(extended)

294

```

295

296

### Data Validation

297

298

```python

299

from dscribe.utils.dimensionality import is1d, is2d

300

import numpy as np

301

302

def validate_descriptor_input(systems, centers=None):

303

"""Validate input for descriptor calculations."""

304

305

# Check systems is iterable

306

try:

307

iter(systems)

308

except TypeError:

309

systems = [systems] # Single system

310

311

# Validate centers if provided

312

if centers is not None:

313

if not is1d(centers, np.integer):

314

raise ValueError("Centers must be 1D integer array")

315

316

if np.any(centers < 0):

317

raise ValueError("Centers indices must be non-negative")

318

319

return systems, centers

320

321

# Usage in descriptor code

322

systems, centers = validate_descriptor_input(my_systems, my_centers)

323

```

324

325

## Integration with Descriptors

326

327

These utilities are used internally by DScribe descriptors but are also available for user applications:

328

329

- **Species utilities**: Used by all descriptors for species validation and processing

330

- **Geometry utilities**: Used by local descriptors for neighbor finding and system extension

331

- **Statistics utilities**: Helpful for setting up descriptor parameters across datasets

332

- **Dimensionality utilities**: Used for input validation throughout the package

333

334

The utilities provide building blocks for custom analysis workflows and help ensure consistent data handling across different parts of the DScribe ecosystem.