or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classical-models.mddata-utilities.mddeep-learning-models.mdensemble-models.mdindex.mdmodern-models.md

modern-models.mddocs/

0

# Modern Detection Models

1

2

State-of-the-art outlier detection algorithms that often provide better performance and scalability compared to classical approaches. These methods incorporate recent advances in machine learning and statistical theory.

3

4

## Capabilities

5

6

### Empirical Cumulative Distribution-based Outlier Detection (ECOD)

7

8

A parameter-free, highly interpretable outlier detection algorithm based on empirical cumulative distribution functions. ECOD is efficient, robust, and provides excellent performance across various datasets.

9

10

```python { .api }

11

class ECOD:

12

def __init__(self, contamination=0.1, n_jobs=1):

13

"""

14

Parameters:

15

- contamination (float): Proportion of outliers in dataset

16

- n_jobs (int): Number of parallel jobs for computation

17

"""

18

```

19

20

Usage example:

21

```python

22

from pyod.models.ecod import ECOD

23

from pyod.utils.data import generate_data

24

25

X_train, X_test, y_train, y_test = generate_data(contamination=0.1, random_state=42)

26

27

clf = ECOD(contamination=0.1, n_jobs=2)

28

clf.fit(X_train)

29

y_pred = clf.predict(X_test)

30

scores = clf.decision_function(X_test)

31

```

32

33

### Copula-Based Outlier Detection (COPOD)

34

35

Uses copula functions to model the dependence structure among features, providing a robust approach to outlier detection that captures complex relationships between variables.

36

37

```python { .api }

38

class COPOD:

39

def __init__(self, contamination=0.1, n_jobs=1):

40

"""

41

Parameters:

42

- contamination (float): Proportion of outliers in dataset

43

- n_jobs (int): Number of parallel jobs for computation

44

"""

45

```

46

47

### Scalable Unsupervised Outlier Detection (SUOD)

48

49

A framework for accelerating outlier detection by using multiple base estimators with approximate methods. Provides significant speedup while maintaining detection quality.

50

51

```python { .api }

52

class SUOD:

53

def __init__(self, base_estimators=None, n_jobs=1, rp_clf_list=None,

54

rp_ng_clf_list=None, rp_flag_global=True, jl_method='basic',

55

jl_proj_nums=None, cost_forecast_loc_fit=None,

56

cost_forecast_loc_pred=None, approx_flag_global=False,

57

approx_clf_list=None, approx_ng_clf_list=None,

58

contamination=0.1, combination='average', verbose=False,

59

random_state=None):

60

"""

61

Parameters:

62

- base_estimators (list): List of base detectors

63

- n_jobs (int): Number of parallel jobs

64

- rp_clf_list (list): List of detectors for random projection

65

- jl_method (str): Johnson-Lindenstrauss method ('basic', 'discrete', 'circulant')

66

- contamination (float): Proportion of outliers in dataset

67

- combination (str): Combination method for scores ('average', 'maximization')

68

- verbose (bool): Whether to print progress information

69

"""

70

```

71

72

### Learning with Uncertainty for Regression (LUNAR)

73

74

A novel approach that combines regression techniques with uncertainty quantification for robust outlier detection, particularly effective for datasets with complex patterns.

75

76

```python { .api }

77

class LUNAR:

78

def __init__(self, model_type='regressor', n_neighbours=5,

79

negative_sampling=1, val_size=0.1, scaler='MinMaxScaler',

80

contamination=0.1):

81

"""

82

Parameters:

83

- model_type (str): Type of base model ('regressor', 'classifier')

84

- n_neighbours (int): Number of neighbors for local modeling

85

- negative_sampling (int): Negative sampling ratio

86

- val_size (float): Validation set size fraction

87

- scaler (str): Scaler type for preprocessing

88

- contamination (float): Proportion of outliers in dataset

89

"""

90

```

91

92

### Deviation-based Outlier Detection (LMDD)

93

94

Detects outliers based on the relative deviation of data points from their local neighborhoods, providing good performance on datasets with complex structures.

95

96

```python { .api }

97

class LMDD:

98

def __init__(self, contamination=0.1, n_iter=50, dis_measure='aad',

99

random_state=None):

100

"""

101

Parameters:

102

- contamination (float): Proportion of outliers in dataset

103

- n_iter (int): Number of iterations for optimization

104

- dis_measure (str): Distance measure ('aad', 'var', 'iqr')

105

- random_state (int): Random number generator seed

106

"""

107

```

108

109

### Lightweight On-line Detector of Anomalies (LODA)

110

111

A fast, online outlier detection algorithm that uses sparse random projections. Effective for high-dimensional data and streaming applications.

112

113

```python { .api }

114

class LODA:

115

def __init__(self, contamination=0.1, n_bins=10, n_random_cuts=100):

116

"""

117

Parameters:

118

- contamination (float): Proportion of outliers in dataset

119

- n_bins (int): Number of bins for histogram

120

- n_random_cuts (int): Number of random projections

121

"""

122

```

123

124

### Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles (INNE)

125

126

Combines the benefits of isolation-based methods with nearest neighbor approaches, providing robust detection across various data distributions.

127

128

```python { .api }

129

class INNE:

130

def __init__(self, n_estimators=200, max_samples=256, contamination=0.1,

131

random_state=None):

132

"""

133

Parameters:

134

- n_estimators (int): Number of estimators in ensemble

135

- max_samples (int): Maximum number of samples per estimator

136

- contamination (float): Proportion of outliers in dataset

137

- random_state (int): Random number generator seed

138

"""

139

```

140

141

### Subspace Outlier Detection (SOD)

142

143

Detects outliers in relevant subspaces rather than the full feature space, making it effective for high-dimensional data where outliers may only be visible in certain dimensions.

144

145

```python { .api }

146

class SOD:

147

def __init__(self, n_neighbors=20, ref_set=10, alpha=0.8, contamination=0.1):

148

"""

149

Parameters:

150

- n_neighbors (int): Number of neighbors to consider

151

- ref_set (int): Size of reference set

152

- alpha (float): Weight parameter for subspace selection

153

- contamination (float): Proportion of outliers in dataset

154

"""

155

```

156

157

### Stochastic Outlier Selection (SOS)

158

159

Uses stochastic methods to compute outlier probabilities, providing uncertainty estimates along with outlier scores.

160

161

```python { .api }

162

class SOS:

163

def __init__(self, perplexity=4.5, metric='euclidean', eps=1e-5,

164

contamination=0.1):

165

"""

166

Parameters:

167

- perplexity (float): Perplexity parameter for probability computation

168

- metric (str): Distance metric to use

169

- eps (float): Numerical stability parameter

170

- contamination (float): Proportion of outliers in dataset

171

"""

172

```

173

174

### Rotation-based Outlier Detection (ROD)

175

176

Generates diverse feature representations through random rotations and combines multiple detectors for improved robustness.

177

178

```python { .api }

179

class ROD:

180

def __init__(self, base_estimator=None, n_estimators=100,

181

max_features=1.0, contamination=0.1, random_state=None):

182

"""

183

Parameters:

184

- base_estimator: Base detector to use

185

- n_estimators (int): Number of estimators

186

- max_features (float): Fraction of features to use

187

- contamination (float): Proportion of outliers in dataset

188

- random_state (int): Random number generator seed

189

"""

190

```

191

192

### Additional Modern Models

193

194

```python { .api }

195

class LOCI:

196

"""Local Correlation Integral"""

197

def __init__(self, contamination=0.1, alpha=0.5, k=3): ...

198

199

class CD:

200

"""Cook's Distance"""

201

def __init__(self, contamination=0.1, whitening=True): ...

202

203

class QMCD:

204

"""Quasi-Monte Carlo Discrepancy"""

205

def __init__(self, contamination=0.1, ref_set=10): ...

206

207

class Sampling:

208

"""Sampling-based outlier detection"""

209

def __init__(self, contamination=0.1, subset_size=20, metric='euclidean'): ...

210

```

211

212

## Usage Patterns

213

214

Modern models follow the same interface as classical models:

215

216

```python

217

# Example with ECOD

218

from pyod.models.ecod import ECOD

219

from pyod.utils.data import generate_data

220

221

# Generate data

222

X_train, X_test, y_train, y_test = generate_data(

223

n_train=500, n_test=200, contamination=0.1, random_state=42

224

)

225

226

# Initialize and fit

227

clf = ECOD(contamination=0.1, n_jobs=2)

228

clf.fit(X_train)

229

230

# Get results

231

train_scores = clf.decision_scores_

232

test_scores = clf.decision_function(X_test)

233

test_labels = clf.predict(X_test)

234

```

235

236

## Performance Characteristics

237

238

### ECOD

239

- **Strengths**: Parameter-free, highly interpretable, excellent empirical performance

240

- **Best for**: General-purpose outlier detection, when interpretability is important

241

- **Time complexity**: O(n*d) where n=samples, d=features

242

243

### COPOD

244

- **Strengths**: Captures feature dependencies, robust to different data distributions

245

- **Best for**: Datasets with complex feature relationships

246

- **Time complexity**: O(n*d²)

247

248

### SUOD

249

- **Strengths**: Significant speedup for ensemble methods, maintains quality

250

- **Best for**: Large datasets requiring fast ensemble-based detection

251

- **Time complexity**: Sublinear speedup over base estimators

252

253

### LUNAR

254

- **Strengths**: Uncertainty quantification, works well with regression patterns

255

- **Best for**: Datasets where uncertainty information is valuable

256

- **Time complexity**: O(n²) for local neighborhood construction

257

258

## Model Selection Guidelines

259

260

- **ECOD**: First choice for most applications due to parameter-free nature and strong performance

261

- **COPOD**: When feature dependencies are important for outlier detection

262

- **SUOD**: When you need ensemble methods but have performance constraints

263

- **LUNAR**: When uncertainty quantification is important

264

- **LODA**: For high-dimensional data or streaming applications

265

- **SOD**: For high-dimensional data where outliers exist in subspaces