Tessl Tile for pypi/dscribe@2.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-classes.md global-descriptors.md index.md kernels.md local-descriptors.md matrix-descriptors.md utilities.md

kernels.mddocs/

0
# Kernels
1

2
DScribe's kernel methods provide similarity measures between atomic structures based on local atomic environment comparisons. These kernels are particularly useful for machine learning applications where you need to measure structural similarity or build kernel-based models.
3

4
## Capabilities
5

6
### AverageKernel
7

8
The AverageKernel computes global structural similarity as the average of local environment similarities. It provides a simple and intuitive way to measure how similar two structures are based on their local atomic environments.
9

10
```python { .api }
11
class AverageKernel:
12
    def __init__(self, metric, gamma=None, degree=3, coef0=1, 
13
                 kernel_params=None, normalize_kernel=True):
14
        """
15
        Initialize Average Kernel.
16
        
17
        Parameters:
18
        - metric (str): Distance metric for local similarities:
19
            - "linear": Linear kernel (dot product)
20
            - "polynomial": Polynomial kernel  
21
            - "rbf": Radial basis function (Gaussian) kernel
22
            - "laplacian": Laplacian kernel
23
            - "sigmoid": Sigmoid kernel
24
        - gamma (float): Kernel coefficient for rbf, polynomial, and sigmoid kernels
25
        - degree (int): Degree for polynomial kernel
26
        - coef0 (float): Independent term for polynomial and sigmoid kernels
27
        - kernel_params (dict): Additional parameters for specific kernels
28
        - normalize_kernel (bool): Whether to normalize the kernel matrix
29
        """
30

31
    def create(self, x, y=None):
32
        """
33
        Create kernel matrix from local descriptors.
34
        
35
        Parameters:
36
        - x: Local descriptors for first set of structures (list of arrays)
37
        - y: Local descriptors for second set of structures (optional, defaults to x)
38
        
39
        Returns:
40
        numpy.ndarray: Kernel matrix with shape (n_structures_x, n_structures_y)
41
        """
42

43
    def get_global_similarity(self, localkernel):
44
        """
45
        Compute global similarity from local similarity matrix.
46
        
47
        Parameters:
48
        - localkernel: Local kernel matrix between environments
49
        
50
        Returns:
51
        float: Global similarity value (average of local similarities)
52
        """
53
```
54

55
**Usage Example:**
56

57
```python
58
from dscribe.kernels import AverageKernel
59
from dscribe.descriptors import SOAP
60
from ase.build import molecule
61

62
# Setup SOAP descriptor for local environments
63
soap = SOAP(species=["H", "O"], r_cut=5.0, n_max=8, l_max=6)
64

65
# Create local descriptors for molecules
66
molecules = [molecule("H2O"), molecule("H2O2")]
67
soap_descriptors = [soap.create(mol) for mol in molecules]
68

69
# Setup Average Kernel with RBF similarity metric
70
kernel = AverageKernel(metric="rbf", gamma=1.0)
71

72
# Compute kernel matrix
73
K = kernel.create(soap_descriptors)  # Shape: (2, 2)
74
print(f"Self-similarity: {K[0,0]}")
75
print(f"Cross-similarity: {K[0,1]}")
76

77
# Compare against different molecules
78
other_molecules = [molecule("NH3"), molecule("CH4")]
79
other_descriptors = [soap.create(mol) for mol in other_molecules]
80
K_cross = kernel.create(soap_descriptors, other_descriptors)  # Shape: (2, 2)
81
```
82

83
### REMatchKernel
84

85
The REMatchKernel (Regularized-Entropy Match Kernel) uses optimal transport theory to find the best matching between local environments of two structures. This provides a more sophisticated similarity measure that accounts for the optimal assignment of local environments.
86

87
```python { .api }
88
class REMatchKernel:
89
    def __init__(self, alpha=0.1, threshold=1e-6, metric="linear", gamma=None,
90
                 degree=3, coef0=1, kernel_params=None, normalize_kernel=True):
91
        """
92
        Initialize REMatch Kernel.
93
        
94
        Parameters:
95
        - alpha (float): Entropy regularization parameter (controls transport cost)
96
        - threshold (float): Convergence threshold for Sinkhorn algorithm
97
        - metric (str): Distance metric for local similarities:
98
            - "linear": Linear kernel (dot product)
99
            - "polynomial": Polynomial kernel
100
            - "rbf": Radial basis function (Gaussian) kernel  
101
            - "laplacian": Laplacian kernel
102
            - "sigmoid": Sigmoid kernel
103
        - gamma (float): Kernel coefficient for rbf, polynomial, and sigmoid kernels
104
        - degree (int): Degree for polynomial kernel
105
        - coef0 (float): Independent term for polynomial and sigmoid kernels
106
        - kernel_params (dict): Additional parameters for specific kernels
107
        - normalize_kernel (bool): Whether to normalize the kernel matrix
108
        """
109

110
    def create(self, x, y=None):
111
        """
112
        Create REMatch kernel matrix from local descriptors.
113
        
114
        Parameters:
115
        - x: Local descriptors for first set of structures (list of arrays)
116
        - y: Local descriptors for second set of structures (optional, defaults to x)
117
        
118
        Returns:
119
        numpy.ndarray: REMatch kernel matrix with shape (n_structures_x, n_structures_y)
120
        """
121

122
    def get_global_similarity(self, localkernel):
123
        """
124
        Compute global similarity using optimal transport matching.
125
        
126
        Parameters:
127
        - localkernel: Local kernel matrix between environments
128
        
129
        Returns:
130
        float: Global similarity value from optimal transport solution
131
        """
132
```
133

134
**Usage Example:**
135

136
```python
137
from dscribe.kernels import REMatchKernel
138
from dscribe.descriptors import SOAP
139
from ase.build import molecule
140

141
# Setup SOAP descriptor
142
soap = SOAP(species=["H", "O"], r_cut=5.0, n_max=8, l_max=6)
143

144
# Create local descriptors
145
molecules = [molecule("H2O"), molecule("H2O2")]
146
soap_descriptors = [soap.create(mol) for mol in molecules]
147

148
# Setup REMatch Kernel with custom parameters
149
rematch = REMatchKernel(
150
    metric="rbf",
151
    gamma=1.0,
152
    alpha=0.1,     # Lower alpha = more regularization
153
    threshold=1e-8  # Higher precision convergence
154
)
155

156
# Compute REMatch kernel matrix
157
K_rematch = rematch.create(soap_descriptors)  # Shape: (2, 2)
158
print(f"REMatch similarity: {K_rematch[0,1]}")
159

160
# Compare with different alpha values
161
rematch_low_reg = REMatchKernel(metric="rbf", gamma=1.0, alpha=0.01)
162
rematch_high_reg = REMatchKernel(metric="rbf", gamma=1.0, alpha=1.0)
163

164
K_low = rematch_low_reg.create(soap_descriptors)
165
K_high = rematch_high_reg.create(soap_descriptors)
166
```
167

168
## Kernel Theory and Applications
169

170
### Local Similarity Foundation
171

172
Both kernels build on the concept of local atomic environment similarity:
173

174
1. **Local descriptors** are computed for each atomic environment in each structure
175
2. **Local kernel matrix** is computed between all environment pairs using the specified metric
176
3. **Global similarity** is derived from the local similarities using different aggregation methods
177

178
### AverageKernel vs REMatchKernel
179

180
**AverageKernel**:
181
- Simple average of all local environment similarities
182
- Computationally efficient
183
- Good for structures with similar local environment counts
184
- Formula: K(A,B) = (1/NM) * Σᵢⱼ Cᵢⱼ(A,B)
185

186
**REMatchKernel**:
187
- Uses optimal transport to find best environment matching
188
- More sophisticated but computationally intensive
189
- Better for structures with different sizes or environment distributions
190
- Uses Sinkhorn algorithm to solve regularized optimal transport
191

192
### Kernel Metrics
193

194
All kernels support various similarity metrics:
195

196
```python
197
# Linear kernel (fastest)
198
kernel = AverageKernel(metric="linear")
199

200
# RBF (Gaussian) kernel - most common
201
kernel = AverageKernel(metric="rbf", gamma=1.0)
202

203
# Polynomial kernel
204
kernel = AverageKernel(metric="polynomial", degree=3, gamma=1.0, coef0=1.0)
205

206
# Laplacian kernel
207
kernel = AverageKernel(metric="laplacian", gamma=1.0)
208

209
# Sigmoid kernel
210
kernel = AverageKernel(metric="sigmoid", gamma=1.0, coef0=1.0)
211
```
212

213
## Usage with Machine Learning
214

215
### Kernel Matrices for Classification
216

217
```python
218
from sklearn.svm import SVC
219
from dscribe.kernels import AverageKernel
220
from dscribe.descriptors import SOAP
221

222
# Prepare data
223
soap = SOAP(species=["C", "H", "O"], r_cut=5.0, n_max=8, l_max=6)
224
structures = [...] # List of ASE Atoms objects
225
labels = [...] # Target labels
226

227
# Compute local descriptors
228
local_descriptors = [soap.create(struct) for struct in structures]
229

230
# Compute kernel matrix
231
kernel = AverageKernel(metric="rbf", gamma=1.0)
232
K_train = kernel.create(local_descriptors)
233

234
# Use precomputed kernel in scikit-learn
235
svm = SVC(kernel="precomputed")
236
svm.fit(K_train, labels)
237

238
# Predict on new data
239
new_descriptors = [soap.create(new_struct) for new_struct in test_structures]
240
K_test = kernel.create(new_descriptors, local_descriptors)
241
predictions = svm.predict(K_test)
242
```
243

244
### Similarity Analysis
245

246
```python
247
# Compute pairwise similarities
248
similarities = kernel.create(local_descriptors)
249

250
# Find most similar structures
251
import numpy as np
252
most_similar_pairs = np.unravel_index(
253
    np.argsort(similarities.ravel())[-10:], 
254
    similarities.shape
255
)
256

257
# Cluster structures based on kernel similarities
258
from sklearn.cluster import SpectralClustering
259
clustering = SpectralClustering(
260
    n_clusters=3, 
261
    affinity="precomputed", 
262
    random_state=42
263
)
264
cluster_labels = clustering.fit_predict(similarities)
265
```
266

267
## Parameter Selection Guidelines
268

269
### AverageKernel Parameters
270

271
- **metric="rbf", gamma=1.0**: Good default for most applications
272
- **Higher gamma**: More sensitive to local differences
273
- **Lower gamma**: More tolerant of local differences
274
- **normalize_kernel=True**: Usually recommended for consistent scaling
275

276
### REMatchKernel Parameters
277

278
- **alpha=0.1**: Balanced regularization (good default)
279
- **Lower alpha (0.01-0.05)**: More regularization, smoother transport
280
- **Higher alpha (0.5-1.0)**: Less regularization, sharper transport
281
- **threshold=1e-6**: Sufficient precision for most applications
282

283
### Computational Considerations
284

285
- **AverageKernel**: Fast, scales linearly with number of environments
286
- **REMatchKernel**: Slower, requires iterative optimization
287
- **Local descriptor size**: Affects both kernel computation time and memory usage
288
- **Number of structures**: Kernel matrix size scales as O(N²)
289

290
Choose AverageKernel for large datasets or when computational efficiency is critical. Use REMatchKernel when maximum accuracy is needed and computational resources are available.

Version

Tile

Files

kernels.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

kernels.mddocs/