or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-dscribe

A Python package for creating feature transformations in applications of machine learning to materials science.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/dscribe@2.1.x

To install, run

npx @tessl/cli install tessl/pypi-dscribe@2.1.0

0

# DScribe

1

2

DScribe is a comprehensive Python library for transforming atomic structures into fixed-size numerical fingerprints (descriptors) used in machine learning applications for materials science. The package provides implementations of various descriptor methods including Coulomb Matrix, Sine Matrix, Ewald Matrix, Atom-centered Symmetry Functions (ACSF), Smooth Overlap of Atomic Positions (SOAP), Many-body Tensor Representation (MBTR), Local Many-body Tensor Representation (LMBTR), and Valle-Oganov descriptor. All descriptors support both spectrum generation and derivative calculations with respect to atomic positions.

3

4

## Package Information

5

6

- **Package Name**: dscribe

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install dscribe` or `conda install -c conda-forge dscribe`

10

11

## Core Imports

12

13

```python

14

import dscribe

15

from dscribe import System

16

```

17

18

For descriptors:

19

20

```python

21

from dscribe.descriptors import SOAP, ACSF, MBTR, CoulombMatrix, SineMatrix, EwaldSumMatrix, LMBTR, ValleOganov

22

```

23

24

For core classes:

25

26

```python

27

from dscribe.core import System, Lattice

28

```

29

30

For kernels:

31

32

```python

33

from dscribe.kernels import AverageKernel, REMatchKernel

34

```

35

36

For utilities:

37

38

```python

39

from dscribe.utils.geometry import get_adjacency_matrix, get_extended_system

40

from dscribe.utils.species import symbols_to_numbers, get_atomic_numbers

41

from dscribe.utils.stats import system_stats

42

from dscribe.utils.dimensionality import is1d, is2d

43

```

44

45

## Basic Usage

46

47

```python

48

import numpy as np

49

from ase.build import molecule

50

from dscribe.descriptors import SOAP, CoulombMatrix

51

from dscribe import System

52

53

# Define atomic structures using ASE

54

samples = [molecule("H2O"), molecule("NO2"), molecule("CO2")]

55

56

# Or create DScribe System objects (extends ASE Atoms with caching)

57

water_system = System.from_atoms(molecule("H2O"))

58

59

# Setup descriptors

60

cm_desc = CoulombMatrix(n_atoms_max=3, permutation="sorted_l2")

61

soap_desc = SOAP(species=["C", "H", "O", "N"], r_cut=5.0, n_max=8, l_max=6)

62

63

# Create descriptors as numpy arrays

64

water = samples[0]

65

coulomb_matrix = cm_desc.create(water)

66

soap = soap_desc.create(water, centers=[0]) # SOAP for atom at index 0

67

68

# Process multiple systems with optional parallelization

69

coulomb_matrices = cm_desc.create(samples, n_jobs=3)

70

oxygen_indices = [np.where(x.get_atomic_numbers() == 8)[0] for x in samples]

71

oxygen_soap = soap_desc.create(samples, centers=oxygen_indices, n_jobs=3)

72

73

# Calculate derivatives with respect to atomic positions

74

derivatives, descriptors = soap_desc.derivatives(water, return_descriptor=True)

75

```

76

77

## Architecture

78

79

DScribe uses a hierarchical descriptor architecture:

80

81

- **Core Classes**: `System` (extended ASE Atoms with caching) and `Lattice` (unit cell representation)

82

- **Descriptor Base Classes**: Abstract base classes defining the descriptor interface

83

- `Descriptor`: Base class for all descriptors

84

- `DescriptorLocal`: Base for per-atom descriptors (SOAP, ACSF, LMBTR)

85

- `DescriptorGlobal`: Base for per-structure descriptors (MBTR, ValleOganov)

86

- `DescriptorMatrix`: Base for matrix descriptors (CoulombMatrix, SineMatrix, EwaldSumMatrix)

87

- **Kernels**: Similarity measures using local environment comparisons

88

- **Utilities**: Helper functions for geometry, species handling, and statistics

89

90

This design enables consistent interfaces across different descriptor types while supporting both local (per-atom) and global (per-structure) feature representations, parallel processing, and derivative calculations for machine learning applications in materials science.

91

92

## Capabilities

93

94

### Local Descriptors

95

96

Local descriptors compute features for individual atoms or local atomic environments, producing per-atom feature vectors that can be averaged or processed separately.

97

98

```python { .api }

99

class SOAP:

100

def __init__(self, r_cut, n_max, l_max, sigma=1.0, rbf="gto",

101

weighting=None, average="off", compression={"mode": "off", "species_weighting": None},

102

species=None, periodic=False, sparse=False, dtype="float64"): ...

103

def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...

104

def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...

105

106

class ACSF:

107

def __init__(self, r_cut, g2_params=None, g3_params=None, g4_params=None, g5_params=None,

108

species=None, periodic=False, sparse=False, dtype="float64"): ...

109

def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...

110

def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...

111

112

class LMBTR:

113

def __init__(self, geometry=None, grid=None, weighting=None, normalize_gaussians=True,

114

normalization="none", species=None, periodic=False, sparse=False, dtype="float64"): ...

115

def create(self, system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False): ...

116

def derivatives(self, system, centers=None, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...

117

```

118

119

[Local Descriptors](./local-descriptors.md)

120

121

### Global Descriptors

122

123

Global descriptors compute features for entire atomic structures, producing a single feature vector per structure that captures overall structural properties.

124

125

```python { .api }

126

class MBTR:

127

def __init__(self, geometry=None, grid=None, weighting=None, normalize_gaussians=True,

128

normalization="none", species=None, periodic=False, sparse=False, dtype="float64"): ...

129

def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...

130

def derivatives(self, system, include=None, exclude=None, method="auto", return_descriptor=False, n_jobs=1, only_physical_cores=False): ...

131

132

class ValleOganov:

133

def __init__(self, species, function, n, sigma, r_cut, sparse=False, dtype="float64"): ...

134

def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...

135

```

136

137

[Global Descriptors](./global-descriptors.md)

138

139

### Matrix Descriptors

140

141

Matrix descriptors represent atomic structures as matrices based on pairwise interactions, then flatten or transform these matrices into fixed-size feature vectors.

142

143

```python { .api }

144

class CoulombMatrix:

145

def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...

146

def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...

147

def get_matrix(self, system): ...

148

149

class SineMatrix:

150

def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...

151

def create(self, system, n_jobs=1, only_physical_cores=False, verbose=False): ...

152

def get_matrix(self, system): ...

153

154

class EwaldSumMatrix:

155

def __init__(self, n_atoms_max, permutation="sorted_l2", sigma=None, seed=None, sparse=False, dtype="float64"): ...

156

def create(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None, n_jobs=1, only_physical_cores=False, verbose=False): ...

157

def get_matrix(self, system, accuracy=1e-5, w=1, r_cut=None, g_cut=None, a=None): ...

158

```

159

160

[Matrix Descriptors](./matrix-descriptors.md)

161

162

### Core Classes

163

164

Core classes provide the foundation for representing atomic systems and lattices with enhanced functionality beyond the standard ASE library.

165

166

```python { .api }

167

class System:

168

def __init__(self, symbols=None, positions=None, numbers=None, cell=None, pbc=None, **kwargs): ...

169

@staticmethod

170

def from_atoms(atoms): ...

171

def get_distance_matrix(self): ...

172

def get_distance_matrix_within_radius(self, radius, pos=None, output_type="coo_matrix"): ...

173

def to_scaled(self, positions, wrap=False): ...

174

def to_cartesian(self, scaled_positions, wrap=False): ...

175

176

class Lattice:

177

def __init__(self, matrix): ...

178

@property

179

def matrix(self): ...

180

@property

181

def lengths(self): ...

182

@property

183

def abc(self): ...

184

def get_cartesian_coords(self, fractional_coords): ...

185

def get_fractional_coords(self, cart_coords): ...

186

```

187

188

[Core Classes](./core-classes.md)

189

190

### Kernels

191

192

Kernel methods for measuring similarity between atomic structures based on local atomic environment comparisons using various similarity metrics.

193

194

```python { .api }

195

class AverageKernel:

196

def __init__(self, metric, gamma=None, degree=3, coef0=1,

197

kernel_params=None, normalize_kernel=True): ...

198

def create(self, x, y=None): ...

199

200

class REMatchKernel:

201

def __init__(self, alpha=0.1, threshold=1e-6, metric="linear", gamma=None,

202

degree=3, coef0=1, kernel_params=None, normalize_kernel=True): ...

203

def create(self, x, y=None): ...

204

```

205

206

[Kernels](./kernels.md)

207

208

### Utilities

209

210

Utility functions for working with atomic species, geometry calculations, statistics, and array operations commonly needed in materials science applications.

211

212

```python { .api }

213

# Species utilities (from dscribe.utils.species)

214

def symbols_to_numbers(symbols): ...

215

def get_atomic_numbers(species): ...

216

217

# Geometry utilities (from dscribe.utils.geometry)

218

def get_adjacency_matrix(radius, pos1, pos2=None, output_type="coo_matrix"): ...

219

def get_adjacency_list(adjacency_matrix): ...

220

def get_extended_system(system, radial_cutoff, centers=None, return_cell_indices=False): ...

221

222

# Statistics utilities (from dscribe.utils.stats)

223

def system_stats(system_iterator): ...

224

225

# Dimensionality utilities (from dscribe.utils.dimensionality)

226

def is1d(array, dtype=None): ...

227

def is2d(array, dtype=None): ...

228

```

229

230

[Utilities](./utilities.md)

231

232

## Common Descriptor Interface

233

234

All descriptor classes implement these standard methods:

235

236

- `create(system, ...)` - Create descriptor for given system(s), returns numpy array or sparse matrix

237

- `get_number_of_features()` - Get total number of features in the descriptor output

238

- `derivatives(...)` - Calculate derivatives with respect to atomic positions (where supported)

239

240

## Common Parameters

241

242

Most descriptors accept these parameters:

243

244

- `system` - ASE Atoms object(s) or DScribe System object(s) to process

245

- `species` - List of atomic species to include in the descriptor

246

- `periodic` - Whether to consider periodic boundary conditions

247

- `sparse` - Whether to return sparse arrays for memory efficiency

248

- `dtype` - Data type for arrays ("float64", "float32")

249

- `n_jobs` - Number of parallel processes for computation

250

- `verbose` - Whether to print progress information during computation