Tessl Tile for pypi/metric-learn@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

base-classes.md index.md specialized-algorithms.md supervised-algorithms.md utilities.md weakly-supervised-algorithms.md

weakly-supervised-algorithms.mddocs/

0
# Weakly-Supervised Algorithms
1

2
Weakly-supervised metric learning algorithms that learn from constraints (pairs, triplets, quadruplets) rather than explicit class labels. These algorithms are useful when you have similarity/dissimilarity information but not necessarily class labels.
3

4
## Capabilities
5

6
### Information Theoretic Metric Learning (ITML)
7

8
Learns a Mahalanobis distance metric by minimizing the LogDet divergence subject to linear constraints on pairs of points. Maintains the initial metric structure while satisfying similarity/dissimilarity constraints.
9

10
```python { .api }
11
class ITML(MahalanobisMixin, TransformerMixin):
12
    def __init__(self, gamma=1.0, max_iter=1000, tol=1e-3, prior='identity', 
13
                 verbose=False, preprocessor=None, random_state=None):
14
        """
15
        Parameters:
16
        - gamma: float, regularization parameter controlling the trade-off
17
        - max_iter: int, maximum number of iterations  
18
        - tol: float, convergence tolerance for optimization
19
        - prior: str or array-like, prior metric ('identity', 'covariance', 'random', or matrix)
20
        - verbose: bool, whether to print progress messages
21
        - preprocessor: array-like or callable, preprocessor for input data
22
        - random_state: int, random state for reproducibility
23
        """
24
    
25
    def fit(self, pairs, y):
26
        """
27
        Fit the ITML metric learner.
28
        
29
        Parameters:
30
        - pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),
31
                3D array of pairs or 2D array of indices
32
        - y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)
33
        
34
        Returns:
35
        - self: returns the instance itself
36
        """
37
```
38

39
Usage example:
40

41
```python
42
from metric_learn import ITML
43
import numpy as np
44

45
# Create sample pairs and constraints
46
pairs = np.random.randn(100, 2, 4)  # 100 pairs of 4-dimensional points
47
y = np.random.choice([-1, 1], 100)  # Random similarity/dissimilarity labels
48

49
itml = ITML(gamma=1.0, max_iter=100)
50
itml.fit(pairs, y)
51
```
52

53
### Least Squares Metric Learning (LSML)
54

55
Learns a metric by minimizing the sum of squared hinge losses over constraints. Formulates metric learning as a least squares problem with similarity/dissimilarity constraints.
56

57
```python { .api }
58
class LSML(MahalanobisMixin, TransformerMixin):
59
    def __init__(self, tol=1e-3, max_iter=1000, verbose=False, preprocessor=None, random_state=None):
60
        """
61
        Parameters:
62
        - tol: float, convergence tolerance
63
        - max_iter: int, maximum number of iterations
64
        - verbose: bool, whether to print progress messages  
65
        - preprocessor: array-like or callable, preprocessor for input data
66
        - random_state: int, random state for reproducibility
67
        """
68
    
69
    def fit(self, pairs, y):
70
        """
71
        Fit the LSML metric learner.
72
        
73
        Parameters:
74
        - pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),
75
                3D array of pairs or 2D array of indices
76
        - y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)
77
        
78
        Returns:
79
        - self: returns the instance itself
80
        """
81
```
82

83
### Sparse Determinant Metric Learning (SDML)
84

85
Learns a sparse Mahalanobis distance metric by optimizing a trade-off between satisfying distance constraints and sparsity. Useful when you want to identify relevant features for the distance metric.
86

87
```python { .api }
88
class SDML(MahalanobisMixin, TransformerMixin):
89
    def __init__(self, balance_param=0.5, sparsity_param=0.01, use_cov=True, 
90
                 preprocessor=None, random_state=None):
91
        """
92
        Parameters:
93
        - balance_param: float, balance parameter between similar and dissimilar constraints
94
        - sparsity_param: float, sparsity regularization parameter
95
        - use_cov: bool, whether to use covariance in the regularization
96
        - preprocessor: array-like or callable, preprocessor for input data
97
        - random_state: int, random state for reproducibility
98
        """
99
    
100
    def fit(self, pairs, y):
101
        """
102
        Fit the SDML metric learner.
103
        
104
        Parameters:
105
        - pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),
106
                3D array of pairs or 2D array of indices
107
        - y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)
108
        
109
        Returns:
110
        - self: returns the instance itself
111
        """
112
```
113

114
Usage example:
115

116
```python
117
from metric_learn import SDML
118
from sklearn.datasets import make_blobs
119

120
# Generate sample data and create pairs
121
X, _ = make_blobs(n_samples=100, centers=3, n_features=5, random_state=42)
122

123
# Create pairs (indices) and labels
124
pairs_idx = [(i, j) for i in range(20) for j in range(i+1, 30)]
125
y = [1 if np.linalg.norm(X[i] - X[j]) < 2.0 else -1 for i, j in pairs_idx]
126

127
sdml = SDML(sparsity_param=0.1, balance_param=0.5)
128
sdml.fit(pairs_idx, y)
129
```
130

131
### Relative Components Analysis (RCA)
132

133
Learns a full rank Mahalanobis distance metric based on a weighted sum of in-chunklets covariance matrices. It applies a global linear transformation to assign large weights to relevant dimensions. Those relevant dimensions are estimated using "chunklets" - subsets of points that are known to belong to the same class.
134

135
```python { .api }
136
class RCA(MahalanobisMixin, TransformerMixin):
137
    def __init__(self, n_components=None, preprocessor=None):
138
        """
139
        Parameters:
140
        - n_components: int or None, dimensionality of reduced space (if None, defaults to dimension of X)
141
        - preprocessor: array-like or callable, preprocessor for input data
142
        """
143
    
144
    def fit(self, X, chunks):
145
        """
146
        Learn the RCA model.
147
        
148
        Parameters:
149
        - X: array-like, shape=(n_samples, n_features), data matrix where each row is a single instance
150
        - chunks: array-like, shape=(n_samples,), array of ints where chunks[i] == j means point i belongs to chunklet j, 
151
                 and chunks[i] == -1 means point i doesn't belong to any chunklet
152
        
153
        Returns:
154
        - self: returns the instance itself
155
        """
156
```
157

158
Usage example:
159

160
```python
161
from metric_learn import RCA
162
import numpy as np
163

164
# Sample data
165
X = np.array([[-0.05, 3.0], [0.05, -3.0], [0.1, -3.55], [-0.1, 3.55],
166
              [-0.95, -0.05], [0.95, 0.05], [0.4, 0.05], [-0.4, -0.05]])
167

168
# Chunklet labels: points 0,1 are in chunk 0; points 2,3 in chunk 1, etc.
169
chunks = [0, 0, 1, 1, 2, 2, 3, 3]
170

171
rca = RCA(n_components=2)
172
rca.fit(X, chunks)
173
```
174

175
### Sparse Compositional Metric Learning (SCML)
176

177
Learns a squared Mahalanobis distance from triplet constraints by optimizing sparse positive weights assigned to a set of rank-one PSD bases. Uses a stochastic composite optimization scheme to handle high-dimensional sparse metrics.
178

179
```python { .api }
180
class SCML(MahalanobisMixin, TransformerMixin):
181
    def __init__(self, beta=1e-5, basis='triplet_diffs', n_basis=None, gamma=5e-3, 
182
                 max_iter=10000, output_iter=500, batch_size=10, verbose=False, 
183
                 preprocessor=None, random_state=None):
184
        """
185
        Parameters:
186
        - beta: float, L1 regularization parameter
187
        - basis: str or array-like, set of bases to construct the metric ('triplet_diffs' or custom array)
188
        - n_basis: int or None, number of bases to use (if None, determined automatically)
189
        - gamma: float, learning rate parameter
190
        - max_iter: int, maximum number of iterations
191
        - output_iter: int, number of iterations between progress output
192
        - batch_size: int, size of mini-batches for stochastic optimization
193
        - verbose: bool, whether to print progress messages
194
        - preprocessor: array-like or callable, preprocessor for input data
195
        - random_state: int, random state for reproducibility
196
        """
197
    
198
    def fit(self, triplets):
199
        """
200
        Fit the SCML metric learner.
201
        
202
        Parameters:
203
        - triplets: array-like, shape=(n_constraints, 3, n_features) or (n_constraints, 3),
204
                   3D array of triplets (anchor, positive, negative) or 2D array of indices
205
        
206
        Returns:
207
        - self: returns the instance itself
208
        """
209
```
210

211
Usage example:
212

213
```python
214
from metric_learn import SCML
215
import numpy as np
216

217
# Create triplet constraints: [anchor, positive, negative]
218
triplets_idx = [(0, 1, 5), (2, 3, 7), (4, 6, 9)]  # Indices of triplets
219

220
# Assuming you have data X
221
X = np.random.randn(20, 5)
222

223
scml = SCML(beta=1e-4, max_iter=1000, preprocessor=X)
224
scml.fit(triplets_idx)
225
```
226

227
## Constraint Formats
228

229
All weakly-supervised algorithms accept constraints in similar formats:
230

231
### Pair Constraints
232

233
```python
234
# 3D array format: pairs contain actual data points
235
pairs_3d = np.array([
236
    [[1.0, 2.0], [1.1, 2.1]],  # Similar pair
237
    [[1.0, 2.0], [5.0, 6.0]]   # Dissimilar pair  
238
])
239
y = [1, -1]  # 1 for similar, -1 for dissimilar
240

241
# 2D array format: pairs contain indices (requires preprocessor)
242
pairs_2d = np.array([[0, 1], [0, 5]])  # Indices into dataset
243
y = [1, -1]
244
```
245

246
### Working with Preprocessors
247

248
When using index-based constraints, set up a preprocessor:
249

250
```python
251
from metric_learn import ITML
252
import numpy as np
253

254
# Your dataset
255
X = np.random.randn(100, 5)
256

257
# Index-based constraints
258
pairs_idx = [(0, 1), (2, 10), (5, 20)]
259
y = [1, -1, 1]
260

261
# Fit with preprocessor
262
itml = ITML(preprocessor=X)
263
itml.fit(pairs_idx, y)
264
```
265

266
## Common Usage Pattern
267

268
```python
269
from metric_learn import ITML, LSML, SDML
270
from metric_learn import Constraints
271
from sklearn.datasets import load_digits
272
import numpy as np
273

274
# Load data
275
X, y_true = load_digits(return_X_y=True)
276

277
# Generate constraints from true labels (for demonstration)
278
constraints = Constraints(y_true)
279
pos_pairs, neg_pairs = constraints.positive_negative_pairs(n_constraints=500)
280
pairs = np.vstack([pos_pairs, neg_pairs])
281
y_constraints = np.hstack([np.ones(len(pos_pairs)), -np.ones(len(neg_pairs))])
282

283
# Train different weakly-supervised learners
284
algorithms = {
285
    'ITML': ITML(preprocessor=X),
286
    'LSML': LSML(preprocessor=X), 
287
    'SDML': SDML(preprocessor=X)
288
}
289

290
for name, algorithm in algorithms.items():
291
    algorithm.fit(pairs, y_constraints)
292
    print(f"{name} fitted successfully")
293
    
294
    # Get learned transformation matrix
295
    L = algorithm.components_
296
    print(f"{name} learned transformation shape: {L.shape}")
297
```

Version

Tile

Files

weakly-supervised-algorithms.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

weakly-supervised-algorithms.mddocs/