or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-classes.mdglobal-descriptors.mdindex.mdkernels.mdlocal-descriptors.mdmatrix-descriptors.mdutilities.md

kernels.mddocs/

0

# Kernels

1

2

DScribe's kernel methods provide similarity measures between atomic structures based on local atomic environment comparisons. These kernels are particularly useful for machine learning applications where you need to measure structural similarity or build kernel-based models.

3

4

## Capabilities

5

6

### AverageKernel

7

8

The AverageKernel computes global structural similarity as the average of local environment similarities. It provides a simple and intuitive way to measure how similar two structures are based on their local atomic environments.

9

10

```python { .api }

11

class AverageKernel:

12

def __init__(self, metric, gamma=None, degree=3, coef0=1,

13

kernel_params=None, normalize_kernel=True):

14

"""

15

Initialize Average Kernel.

16

17

Parameters:

18

- metric (str): Distance metric for local similarities:

19

- "linear": Linear kernel (dot product)

20

- "polynomial": Polynomial kernel

21

- "rbf": Radial basis function (Gaussian) kernel

22

- "laplacian": Laplacian kernel

23

- "sigmoid": Sigmoid kernel

24

- gamma (float): Kernel coefficient for rbf, polynomial, and sigmoid kernels

25

- degree (int): Degree for polynomial kernel

26

- coef0 (float): Independent term for polynomial and sigmoid kernels

27

- kernel_params (dict): Additional parameters for specific kernels

28

- normalize_kernel (bool): Whether to normalize the kernel matrix

29

"""

30

31

def create(self, x, y=None):

32

"""

33

Create kernel matrix from local descriptors.

34

35

Parameters:

36

- x: Local descriptors for first set of structures (list of arrays)

37

- y: Local descriptors for second set of structures (optional, defaults to x)

38

39

Returns:

40

numpy.ndarray: Kernel matrix with shape (n_structures_x, n_structures_y)

41

"""

42

43

def get_global_similarity(self, localkernel):

44

"""

45

Compute global similarity from local similarity matrix.

46

47

Parameters:

48

- localkernel: Local kernel matrix between environments

49

50

Returns:

51

float: Global similarity value (average of local similarities)

52

"""

53

```

54

55

**Usage Example:**

56

57

```python

58

from dscribe.kernels import AverageKernel

59

from dscribe.descriptors import SOAP

60

from ase.build import molecule

61

62

# Setup SOAP descriptor for local environments

63

soap = SOAP(species=["H", "O"], r_cut=5.0, n_max=8, l_max=6)

64

65

# Create local descriptors for molecules

66

molecules = [molecule("H2O"), molecule("H2O2")]

67

soap_descriptors = [soap.create(mol) for mol in molecules]

68

69

# Setup Average Kernel with RBF similarity metric

70

kernel = AverageKernel(metric="rbf", gamma=1.0)

71

72

# Compute kernel matrix

73

K = kernel.create(soap_descriptors) # Shape: (2, 2)

74

print(f"Self-similarity: {K[0,0]}")

75

print(f"Cross-similarity: {K[0,1]}")

76

77

# Compare against different molecules

78

other_molecules = [molecule("NH3"), molecule("CH4")]

79

other_descriptors = [soap.create(mol) for mol in other_molecules]

80

K_cross = kernel.create(soap_descriptors, other_descriptors) # Shape: (2, 2)

81

```

82

83

### REMatchKernel

84

85

The REMatchKernel (Regularized-Entropy Match Kernel) uses optimal transport theory to find the best matching between local environments of two structures. This provides a more sophisticated similarity measure that accounts for the optimal assignment of local environments.

86

87

```python { .api }

88

class REMatchKernel:

89

def __init__(self, alpha=0.1, threshold=1e-6, metric="linear", gamma=None,

90

degree=3, coef0=1, kernel_params=None, normalize_kernel=True):

91

"""

92

Initialize REMatch Kernel.

93

94

Parameters:

95

- alpha (float): Entropy regularization parameter (controls transport cost)

96

- threshold (float): Convergence threshold for Sinkhorn algorithm

97

- metric (str): Distance metric for local similarities:

98

- "linear": Linear kernel (dot product)

99

- "polynomial": Polynomial kernel

100

- "rbf": Radial basis function (Gaussian) kernel

101

- "laplacian": Laplacian kernel

102

- "sigmoid": Sigmoid kernel

103

- gamma (float): Kernel coefficient for rbf, polynomial, and sigmoid kernels

104

- degree (int): Degree for polynomial kernel

105

- coef0 (float): Independent term for polynomial and sigmoid kernels

106

- kernel_params (dict): Additional parameters for specific kernels

107

- normalize_kernel (bool): Whether to normalize the kernel matrix

108

"""

109

110

def create(self, x, y=None):

111

"""

112

Create REMatch kernel matrix from local descriptors.

113

114

Parameters:

115

- x: Local descriptors for first set of structures (list of arrays)

116

- y: Local descriptors for second set of structures (optional, defaults to x)

117

118

Returns:

119

numpy.ndarray: REMatch kernel matrix with shape (n_structures_x, n_structures_y)

120

"""

121

122

def get_global_similarity(self, localkernel):

123

"""

124

Compute global similarity using optimal transport matching.

125

126

Parameters:

127

- localkernel: Local kernel matrix between environments

128

129

Returns:

130

float: Global similarity value from optimal transport solution

131

"""

132

```

133

134

**Usage Example:**

135

136

```python

137

from dscribe.kernels import REMatchKernel

138

from dscribe.descriptors import SOAP

139

from ase.build import molecule

140

141

# Setup SOAP descriptor

142

soap = SOAP(species=["H", "O"], r_cut=5.0, n_max=8, l_max=6)

143

144

# Create local descriptors

145

molecules = [molecule("H2O"), molecule("H2O2")]

146

soap_descriptors = [soap.create(mol) for mol in molecules]

147

148

# Setup REMatch Kernel with custom parameters

149

rematch = REMatchKernel(

150

metric="rbf",

151

gamma=1.0,

152

alpha=0.1, # Lower alpha = more regularization

153

threshold=1e-8 # Higher precision convergence

154

)

155

156

# Compute REMatch kernel matrix

157

K_rematch = rematch.create(soap_descriptors) # Shape: (2, 2)

158

print(f"REMatch similarity: {K_rematch[0,1]}")

159

160

# Compare with different alpha values

161

rematch_low_reg = REMatchKernel(metric="rbf", gamma=1.0, alpha=0.01)

162

rematch_high_reg = REMatchKernel(metric="rbf", gamma=1.0, alpha=1.0)

163

164

K_low = rematch_low_reg.create(soap_descriptors)

165

K_high = rematch_high_reg.create(soap_descriptors)

166

```

167

168

## Kernel Theory and Applications

169

170

### Local Similarity Foundation

171

172

Both kernels build on the concept of local atomic environment similarity:

173

174

1. **Local descriptors** are computed for each atomic environment in each structure

175

2. **Local kernel matrix** is computed between all environment pairs using the specified metric

176

3. **Global similarity** is derived from the local similarities using different aggregation methods

177

178

### AverageKernel vs REMatchKernel

179

180

**AverageKernel**:

181

- Simple average of all local environment similarities

182

- Computationally efficient

183

- Good for structures with similar local environment counts

184

- Formula: K(A,B) = (1/NM) * Σᵢⱼ Cᵢⱼ(A,B)

185

186

**REMatchKernel**:

187

- Uses optimal transport to find best environment matching

188

- More sophisticated but computationally intensive

189

- Better for structures with different sizes or environment distributions

190

- Uses Sinkhorn algorithm to solve regularized optimal transport

191

192

### Kernel Metrics

193

194

All kernels support various similarity metrics:

195

196

```python

197

# Linear kernel (fastest)

198

kernel = AverageKernel(metric="linear")

199

200

# RBF (Gaussian) kernel - most common

201

kernel = AverageKernel(metric="rbf", gamma=1.0)

202

203

# Polynomial kernel

204

kernel = AverageKernel(metric="polynomial", degree=3, gamma=1.0, coef0=1.0)

205

206

# Laplacian kernel

207

kernel = AverageKernel(metric="laplacian", gamma=1.0)

208

209

# Sigmoid kernel

210

kernel = AverageKernel(metric="sigmoid", gamma=1.0, coef0=1.0)

211

```

212

213

## Usage with Machine Learning

214

215

### Kernel Matrices for Classification

216

217

```python

218

from sklearn.svm import SVC

219

from dscribe.kernels import AverageKernel

220

from dscribe.descriptors import SOAP

221

222

# Prepare data

223

soap = SOAP(species=["C", "H", "O"], r_cut=5.0, n_max=8, l_max=6)

224

structures = [...] # List of ASE Atoms objects

225

labels = [...] # Target labels

226

227

# Compute local descriptors

228

local_descriptors = [soap.create(struct) for struct in structures]

229

230

# Compute kernel matrix

231

kernel = AverageKernel(metric="rbf", gamma=1.0)

232

K_train = kernel.create(local_descriptors)

233

234

# Use precomputed kernel in scikit-learn

235

svm = SVC(kernel="precomputed")

236

svm.fit(K_train, labels)

237

238

# Predict on new data

239

new_descriptors = [soap.create(new_struct) for new_struct in test_structures]

240

K_test = kernel.create(new_descriptors, local_descriptors)

241

predictions = svm.predict(K_test)

242

```

243

244

### Similarity Analysis

245

246

```python

247

# Compute pairwise similarities

248

similarities = kernel.create(local_descriptors)

249

250

# Find most similar structures

251

import numpy as np

252

most_similar_pairs = np.unravel_index(

253

np.argsort(similarities.ravel())[-10:],

254

similarities.shape

255

)

256

257

# Cluster structures based on kernel similarities

258

from sklearn.cluster import SpectralClustering

259

clustering = SpectralClustering(

260

n_clusters=3,

261

affinity="precomputed",

262

random_state=42

263

)

264

cluster_labels = clustering.fit_predict(similarities)

265

```

266

267

## Parameter Selection Guidelines

268

269

### AverageKernel Parameters

270

271

- **metric="rbf", gamma=1.0**: Good default for most applications

272

- **Higher gamma**: More sensitive to local differences

273

- **Lower gamma**: More tolerant of local differences

274

- **normalize_kernel=True**: Usually recommended for consistent scaling

275

276

### REMatchKernel Parameters

277

278

- **alpha=0.1**: Balanced regularization (good default)

279

- **Lower alpha (0.01-0.05)**: More regularization, smoother transport

280

- **Higher alpha (0.5-1.0)**: Less regularization, sharper transport

281

- **threshold=1e-6**: Sufficient precision for most applications

282

283

### Computational Considerations

284

285

- **AverageKernel**: Fast, scales linearly with number of environments

286

- **REMatchKernel**: Slower, requires iterative optimization

287

- **Local descriptor size**: Affects both kernel computation time and memory usage

288

- **Number of structures**: Kernel matrix size scales as O(N²)

289

290

Choose AverageKernel for large datasets or when computational efficiency is critical. Use REMatchKernel when maximum accuracy is needed and computational resources are available.