or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

base-classes.mdindex.mdspecialized-algorithms.mdsupervised-algorithms.mdutilities.mdweakly-supervised-algorithms.md

weakly-supervised-algorithms.mddocs/

0

# Weakly-Supervised Algorithms

1

2

Weakly-supervised metric learning algorithms that learn from constraints (pairs, triplets, quadruplets) rather than explicit class labels. These algorithms are useful when you have similarity/dissimilarity information but not necessarily class labels.

3

4

## Capabilities

5

6

### Information Theoretic Metric Learning (ITML)

7

8

Learns a Mahalanobis distance metric by minimizing the LogDet divergence subject to linear constraints on pairs of points. Maintains the initial metric structure while satisfying similarity/dissimilarity constraints.

9

10

```python { .api }

11

class ITML(MahalanobisMixin, TransformerMixin):

12

def __init__(self, gamma=1.0, max_iter=1000, tol=1e-3, prior='identity',

13

verbose=False, preprocessor=None, random_state=None):

14

"""

15

Parameters:

16

- gamma: float, regularization parameter controlling the trade-off

17

- max_iter: int, maximum number of iterations

18

- tol: float, convergence tolerance for optimization

19

- prior: str or array-like, prior metric ('identity', 'covariance', 'random', or matrix)

20

- verbose: bool, whether to print progress messages

21

- preprocessor: array-like or callable, preprocessor for input data

22

- random_state: int, random state for reproducibility

23

"""

24

25

def fit(self, pairs, y):

26

"""

27

Fit the ITML metric learner.

28

29

Parameters:

30

- pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),

31

3D array of pairs or 2D array of indices

32

- y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)

33

34

Returns:

35

- self: returns the instance itself

36

"""

37

```

38

39

Usage example:

40

41

```python

42

from metric_learn import ITML

43

import numpy as np

44

45

# Create sample pairs and constraints

46

pairs = np.random.randn(100, 2, 4) # 100 pairs of 4-dimensional points

47

y = np.random.choice([-1, 1], 100) # Random similarity/dissimilarity labels

48

49

itml = ITML(gamma=1.0, max_iter=100)

50

itml.fit(pairs, y)

51

```

52

53

### Least Squares Metric Learning (LSML)

54

55

Learns a metric by minimizing the sum of squared hinge losses over constraints. Formulates metric learning as a least squares problem with similarity/dissimilarity constraints.

56

57

```python { .api }

58

class LSML(MahalanobisMixin, TransformerMixin):

59

def __init__(self, tol=1e-3, max_iter=1000, verbose=False, preprocessor=None, random_state=None):

60

"""

61

Parameters:

62

- tol: float, convergence tolerance

63

- max_iter: int, maximum number of iterations

64

- verbose: bool, whether to print progress messages

65

- preprocessor: array-like or callable, preprocessor for input data

66

- random_state: int, random state for reproducibility

67

"""

68

69

def fit(self, pairs, y):

70

"""

71

Fit the LSML metric learner.

72

73

Parameters:

74

- pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),

75

3D array of pairs or 2D array of indices

76

- y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)

77

78

Returns:

79

- self: returns the instance itself

80

"""

81

```

82

83

### Sparse Determinant Metric Learning (SDML)

84

85

Learns a sparse Mahalanobis distance metric by optimizing a trade-off between satisfying distance constraints and sparsity. Useful when you want to identify relevant features for the distance metric.

86

87

```python { .api }

88

class SDML(MahalanobisMixin, TransformerMixin):

89

def __init__(self, balance_param=0.5, sparsity_param=0.01, use_cov=True,

90

preprocessor=None, random_state=None):

91

"""

92

Parameters:

93

- balance_param: float, balance parameter between similar and dissimilar constraints

94

- sparsity_param: float, sparsity regularization parameter

95

- use_cov: bool, whether to use covariance in the regularization

96

- preprocessor: array-like or callable, preprocessor for input data

97

- random_state: int, random state for reproducibility

98

"""

99

100

def fit(self, pairs, y):

101

"""

102

Fit the SDML metric learner.

103

104

Parameters:

105

- pairs: array-like, shape=(n_constraints, 2, n_features) or (n_constraints, 2),

106

3D array of pairs or 2D array of indices

107

- y: array-like, shape=(n_constraints,), constraint labels (+1 for similar, -1 for dissimilar)

108

109

Returns:

110

- self: returns the instance itself

111

"""

112

```

113

114

Usage example:

115

116

```python

117

from metric_learn import SDML

118

from sklearn.datasets import make_blobs

119

120

# Generate sample data and create pairs

121

X, _ = make_blobs(n_samples=100, centers=3, n_features=5, random_state=42)

122

123

# Create pairs (indices) and labels

124

pairs_idx = [(i, j) for i in range(20) for j in range(i+1, 30)]

125

y = [1 if np.linalg.norm(X[i] - X[j]) < 2.0 else -1 for i, j in pairs_idx]

126

127

sdml = SDML(sparsity_param=0.1, balance_param=0.5)

128

sdml.fit(pairs_idx, y)

129

```

130

131

### Relative Components Analysis (RCA)

132

133

Learns a full rank Mahalanobis distance metric based on a weighted sum of in-chunklets covariance matrices. It applies a global linear transformation to assign large weights to relevant dimensions. Those relevant dimensions are estimated using "chunklets" - subsets of points that are known to belong to the same class.

134

135

```python { .api }

136

class RCA(MahalanobisMixin, TransformerMixin):

137

def __init__(self, n_components=None, preprocessor=None):

138

"""

139

Parameters:

140

- n_components: int or None, dimensionality of reduced space (if None, defaults to dimension of X)

141

- preprocessor: array-like or callable, preprocessor for input data

142

"""

143

144

def fit(self, X, chunks):

145

"""

146

Learn the RCA model.

147

148

Parameters:

149

- X: array-like, shape=(n_samples, n_features), data matrix where each row is a single instance

150

- chunks: array-like, shape=(n_samples,), array of ints where chunks[i] == j means point i belongs to chunklet j,

151

and chunks[i] == -1 means point i doesn't belong to any chunklet

152

153

Returns:

154

- self: returns the instance itself

155

"""

156

```

157

158

Usage example:

159

160

```python

161

from metric_learn import RCA

162

import numpy as np

163

164

# Sample data

165

X = np.array([[-0.05, 3.0], [0.05, -3.0], [0.1, -3.55], [-0.1, 3.55],

166

[-0.95, -0.05], [0.95, 0.05], [0.4, 0.05], [-0.4, -0.05]])

167

168

# Chunklet labels: points 0,1 are in chunk 0; points 2,3 in chunk 1, etc.

169

chunks = [0, 0, 1, 1, 2, 2, 3, 3]

170

171

rca = RCA(n_components=2)

172

rca.fit(X, chunks)

173

```

174

175

### Sparse Compositional Metric Learning (SCML)

176

177

Learns a squared Mahalanobis distance from triplet constraints by optimizing sparse positive weights assigned to a set of rank-one PSD bases. Uses a stochastic composite optimization scheme to handle high-dimensional sparse metrics.

178

179

```python { .api }

180

class SCML(MahalanobisMixin, TransformerMixin):

181

def __init__(self, beta=1e-5, basis='triplet_diffs', n_basis=None, gamma=5e-3,

182

max_iter=10000, output_iter=500, batch_size=10, verbose=False,

183

preprocessor=None, random_state=None):

184

"""

185

Parameters:

186

- beta: float, L1 regularization parameter

187

- basis: str or array-like, set of bases to construct the metric ('triplet_diffs' or custom array)

188

- n_basis: int or None, number of bases to use (if None, determined automatically)

189

- gamma: float, learning rate parameter

190

- max_iter: int, maximum number of iterations

191

- output_iter: int, number of iterations between progress output

192

- batch_size: int, size of mini-batches for stochastic optimization

193

- verbose: bool, whether to print progress messages

194

- preprocessor: array-like or callable, preprocessor for input data

195

- random_state: int, random state for reproducibility

196

"""

197

198

def fit(self, triplets):

199

"""

200

Fit the SCML metric learner.

201

202

Parameters:

203

- triplets: array-like, shape=(n_constraints, 3, n_features) or (n_constraints, 3),

204

3D array of triplets (anchor, positive, negative) or 2D array of indices

205

206

Returns:

207

- self: returns the instance itself

208

"""

209

```

210

211

Usage example:

212

213

```python

214

from metric_learn import SCML

215

import numpy as np

216

217

# Create triplet constraints: [anchor, positive, negative]

218

triplets_idx = [(0, 1, 5), (2, 3, 7), (4, 6, 9)] # Indices of triplets

219

220

# Assuming you have data X

221

X = np.random.randn(20, 5)

222

223

scml = SCML(beta=1e-4, max_iter=1000, preprocessor=X)

224

scml.fit(triplets_idx)

225

```

226

227

## Constraint Formats

228

229

All weakly-supervised algorithms accept constraints in similar formats:

230

231

### Pair Constraints

232

233

```python

234

# 3D array format: pairs contain actual data points

235

pairs_3d = np.array([

236

[[1.0, 2.0], [1.1, 2.1]], # Similar pair

237

[[1.0, 2.0], [5.0, 6.0]] # Dissimilar pair

238

])

239

y = [1, -1] # 1 for similar, -1 for dissimilar

240

241

# 2D array format: pairs contain indices (requires preprocessor)

242

pairs_2d = np.array([[0, 1], [0, 5]]) # Indices into dataset

243

y = [1, -1]

244

```

245

246

### Working with Preprocessors

247

248

When using index-based constraints, set up a preprocessor:

249

250

```python

251

from metric_learn import ITML

252

import numpy as np

253

254

# Your dataset

255

X = np.random.randn(100, 5)

256

257

# Index-based constraints

258

pairs_idx = [(0, 1), (2, 10), (5, 20)]

259

y = [1, -1, 1]

260

261

# Fit with preprocessor

262

itml = ITML(preprocessor=X)

263

itml.fit(pairs_idx, y)

264

```

265

266

## Common Usage Pattern

267

268

```python

269

from metric_learn import ITML, LSML, SDML

270

from metric_learn import Constraints

271

from sklearn.datasets import load_digits

272

import numpy as np

273

274

# Load data

275

X, y_true = load_digits(return_X_y=True)

276

277

# Generate constraints from true labels (for demonstration)

278

constraints = Constraints(y_true)

279

pos_pairs, neg_pairs = constraints.positive_negative_pairs(n_constraints=500)

280

pairs = np.vstack([pos_pairs, neg_pairs])

281

y_constraints = np.hstack([np.ones(len(pos_pairs)), -np.ones(len(neg_pairs))])

282

283

# Train different weakly-supervised learners

284

algorithms = {

285

'ITML': ITML(preprocessor=X),

286

'LSML': LSML(preprocessor=X),

287

'SDML': SDML(preprocessor=X)

288

}

289

290

for name, algorithm in algorithms.items():

291

algorithm.fit(pairs, y_constraints)

292

print(f"{name} fitted successfully")

293

294

# Get learned transformation matrix

295

L = algorithm.components_

296

print(f"{name} learned transformation shape: {L.shape}")

297

```