or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classical-models.mddata-utilities.mddeep-learning-models.mdensemble-models.mdindex.mdmodern-models.md

ensemble-models.mddocs/

0

# Ensemble Models

1

2

Combination methods that leverage multiple base detectors to improve detection performance through diversity and aggregation strategies. Ensemble methods often provide more robust and reliable outlier detection than individual models.

3

4

## Capabilities

5

6

### Feature Bagging

7

8

Combines multiple base detectors trained on different feature subsets. This approach increases diversity and reduces the impact of irrelevant features on outlier detection.

9

10

```python { .api }

11

class FeatureBagging:

12

def __init__(self, base_estimator=None, n_estimators=10, max_features=1.0,

13

bootstrap_features=False, check_detector=True, check_estimator=False,

14

n_jobs=1, random_state=None, combination='average',

15

verbose=0, estimator_params=None, contamination=0.1):

16

"""

17

Parameters:

18

- base_estimator: Base detector (default: LOF)

19

- n_estimators (int): Number of estimators in ensemble

20

- max_features (int or float): Number/fraction of features per estimator

21

- bootstrap_features (bool): Whether to use bootstrap sampling for features

22

- n_jobs (int): Number of parallel jobs

23

- combination (str): Method to combine scores ('average', 'max')

24

- contamination (float): Proportion of outliers in dataset

25

- estimator_params (dict): Parameters for base estimator

26

"""

27

```

28

29

Usage example:

30

```python

31

from pyod.models.feature_bagging import FeatureBagging

32

from pyod.models.lof import LOF

33

from pyod.utils.data import generate_data

34

35

X_train, X_test, y_train, y_test = generate_data(

36

n_train=500, n_test=200, n_features=10, contamination=0.1, random_state=42

37

)

38

39

# Use LOF as base estimator

40

clf = FeatureBagging(

41

base_estimator=LOF(),

42

n_estimators=10,

43

max_features=0.7,

44

contamination=0.1,

45

n_jobs=2

46

)

47

clf.fit(X_train)

48

y_pred = clf.predict(X_test)

49

```

50

51

### Locally Selective Combination in Parallel (LSCP)

52

53

Combines multiple detectors by selecting the most competent detector for each data point based on local performance. This adaptive approach leverages detector diversity more effectively.

54

55

```python { .api }

56

class LSCP:

57

def __init__(self, detector_list, local_region_size=30, local_max_features=1.0,

58

n_bins=10, random_state=None, contamination=0.1):

59

"""

60

Parameters:

61

- detector_list (list): List of fitted detectors to combine

62

- local_region_size (int): Size of local region for competence estimation

63

- local_max_features (float): Maximum features for local region construction

64

- n_bins (int): Number of bins for histogram-based selection

65

- contamination (float): Proportion of outliers in dataset

66

"""

67

```

68

69

Usage example:

70

```python

71

from pyod.models.lscp import LSCP

72

from pyod.models.lof import LOF

73

from pyod.models.iforest import IForest

74

from pyod.models.ocsvm import OCSVM

75

76

# Train base detectors

77

lof = LOF()

78

iforest = IForest()

79

ocsvm = OCSVM()

80

81

lof.fit(X_train)

82

iforest.fit(X_train)

83

ocsvm.fit(X_train)

84

85

# Combine with LSCP

86

clf = LSCP(

87

detector_list=[lof, iforest, ocsvm],

88

local_region_size=30,

89

contamination=0.1

90

)

91

clf.fit(X_train)

92

y_pred = clf.predict(X_test)

93

```

94

95

### Model Combination Functions

96

97

PyOD provides several functions for combining outlier scores from multiple detectors:

98

99

```python { .api }

100

def average(scores):

101

"""

102

Simple average combination of multiple outlier score matrices.

103

104

Parameters:

105

- scores (array): Score matrix of shape (n_samples, n_detectors)

106

107

Returns:

108

- combined_scores (array): Combined outlier scores

109

"""

110

111

def maximization(scores):

112

"""

113

Maximization combination: take maximum score across detectors.

114

115

Parameters:

116

- scores (array): Score matrix of shape (n_samples, n_detectors)

117

118

Returns:

119

- combined_scores (array): Combined outlier scores

120

"""

121

122

def aom(scores, n_buckets=5, method='static'):

123

"""

124

Average of Maximum: divide detectors into buckets and average the maximum scores.

125

126

Parameters:

127

- scores (array): Score matrix of shape (n_samples, n_detectors)

128

- n_buckets (int): Number of buckets to divide detectors

129

- method (str): Bucketing method ('static', 'dynamic')

130

131

Returns:

132

- combined_scores (array): Combined outlier scores

133

"""

134

135

def moa(scores, n_buckets=5, method='static'):

136

"""

137

Maximum of Average: take maximum of averaged scores from each bucket.

138

139

Parameters:

140

- scores (array): Score matrix of shape (n_samples, n_detectors)

141

- n_buckets (int): Number of buckets to divide detectors

142

- method (str): Bucketing method ('static', 'dynamic')

143

144

Returns:

145

- combined_scores (array): Combined outlier scores

146

"""

147

148

def median(scores):

149

"""

150

Median combination of multiple outlier score matrices.

151

152

Parameters:

153

- scores (array): Score matrix of shape (n_samples, n_detectors)

154

155

Returns:

156

- combined_scores (array): Combined outlier scores

157

"""

158

```

159

160

## Usage Patterns

161

162

### Creating Custom Ensembles

163

164

```python

165

from pyod.models.combination import average, aom, moa

166

from pyod.models.lof import LOF

167

from pyod.models.iforest import IForest

168

from pyod.models.ocsvm import OCSVM

169

import numpy as np

170

171

# Train multiple detectors

172

detectors = [LOF(), IForest(), OCSVM()]

173

for detector in detectors:

174

detector.fit(X_train)

175

176

# Get scores from all detectors

177

train_scores = np.zeros((len(X_train), len(detectors)))

178

test_scores = np.zeros((len(X_test), len(detectors)))

179

180

for i, detector in enumerate(detectors):

181

train_scores[:, i] = detector.decision_scores_

182

test_scores[:, i] = detector.decision_function(X_test)

183

184

# Combine scores using different methods

185

combined_avg = average(test_scores)

186

combined_max = maximization(test_scores)

187

combined_aom = aom(test_scores, n_buckets=3)

188

combined_moa = moa(test_scores, n_buckets=3)

189

```

190

191

### Dynamic Detector Selection

192

193

```python

194

from pyod.models.lscp import LSCP

195

from pyod.models.lof import LOF

196

from pyod.models.iforest import IForest

197

from pyod.models.knn import KNN

198

from pyod.models.ecod import ECOD

199

200

# Create diverse set of detectors

201

detectors = [

202

LOF(n_neighbors=20),

203

LOF(n_neighbors=40), # Different parameters

204

IForest(n_estimators=100),

205

KNN(n_neighbors=5, method='mean'),

206

ECOD()

207

]

208

209

# Fit detectors

210

for detector in detectors:

211

detector.fit(X_train)

212

213

# Use LSCP for adaptive combination

214

clf = LSCP(

215

detector_list=detectors,

216

local_region_size=40,

217

contamination=0.1

218

)

219

clf.fit(X_train)

220

y_pred = clf.predict(X_test)

221

```

222

223

### Advanced Ensemble Strategies

224

225

```python

226

from pyod.models.feature_bagging import FeatureBagging

227

from pyod.models.lof import LOF

228

from pyod.models.iforest import IForest

229

230

# Create ensembles of different base detectors

231

lof_ensemble = FeatureBagging(

232

base_estimator=LOF(n_neighbors=20),

233

n_estimators=10,

234

max_features=0.8,

235

contamination=0.1

236

)

237

238

iforest_ensemble = FeatureBagging(

239

base_estimator=IForest(n_estimators=50),

240

n_estimators=5,

241

max_features=0.9,

242

contamination=0.1

243

)

244

245

# Fit ensembles

246

lof_ensemble.fit(X_train)

247

iforest_ensemble.fit(X_train)

248

249

# Combine ensemble scores

250

lof_scores = lof_ensemble.decision_function(X_test)

251

iforest_scores = iforest_ensemble.decision_function(X_test)

252

253

ensemble_scores = np.column_stack([lof_scores, iforest_scores])

254

final_scores = average(ensemble_scores)

255

```

256

257

## Ensemble Design Principles

258

259

### Diversity Strategies

260

261

1. **Algorithm Diversity**: Use different types of detectors (distance-based, density-based, tree-based)

262

2. **Parameter Diversity**: Same algorithm with different hyperparameters

263

3. **Feature Diversity**: Train detectors on different feature subsets

264

4. **Sample Diversity**: Use bootstrap sampling or different training subsets

265

266

### Combination Strategies

267

268

1. **Simple Average**: Good baseline, works well when detectors have similar performance

269

2. **Weighted Average**: Weight detectors by their individual performance

270

3. **Dynamic Selection**: Choose different detectors for different regions (LSCP)

271

4. **Rank-based**: Combine based on rank orders rather than raw scores

272

273

## Model Selection Guidelines

274

275

### FeatureBagging

276

- **Best for**: High-dimensional data, when features have varying importance

277

- **Base detector**: Works well with LOF, KNN, or other distance-based methods

278

- **Performance**: Usually improves over single detector, especially with irrelevant features

279

280

### LSCP

281

- **Best for**: Heterogeneous data, when detector performance varies by region

282

- **Detector mix**: Combine diverse algorithms (LOF, IForest, OCSVM, etc.)

283

- **Performance**: Often achieves best results but requires more computation

284

285

### Manual Combination

286

- **Best for**: When you want full control over combination strategy

287

- **Flexibility**: Can implement custom weighting and selection schemes

288

- **Performance**: Depends on combination method and detector diversity

289

290

## Best Practices

291

292

1. **Detector Diversity**: Use complementary algorithms rather than similar ones

293

2. **Parameter Tuning**: Tune individual detectors before combining

294

3. **Validation**: Use validation set to select best combination method

295

4. **Computational Cost**: Balance ensemble size with available computational resources

296

5. **Score Normalization**: Consider normalizing scores before combination

297

6. **Performance Monitoring**: Track individual detector contributions to ensemble