or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pyod

A comprehensive Python library for detecting anomalous/outlying objects in multivariate data with 45+ algorithms.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pyod@2.0.x

To install, run

npx @tessl/cli install tessl/pypi-pyod@2.0.0

0

# PyOD

1

2

A comprehensive Python library for detecting anomalous/outlying objects in multivariate data. PyOD provides 45+ detection algorithms ranging from classical methods like Local Outlier Factor (LOF) to cutting-edge approaches like ECOD and deep learning models, all with a unified scikit-learn-compatible interface.

3

4

## Package Information

5

6

- **Package Name**: pyod

7

- **Language**: Python

8

- **Installation**: `pip install pyod==2.0.5`

9

- **Documentation**: https://pyod.readthedocs.io/

10

- **License**: BSD 2-Clause

11

12

## Core Imports

13

14

```python

15

import pyod

16

```

17

18

Models are imported directly from individual files:

19

20

```python

21

from pyod.models.lof import LOF

22

from pyod.models.iforest import IForest

23

from pyod.models.ecod import ECOD

24

```

25

26

Utilities:

27

28

```python

29

from pyod.utils.data import generate_data, evaluate_print

30

from pyod.utils.utility import standardizer, score_to_label

31

```

32

33

## Basic Usage

34

35

```python

36

from pyod.models.lof import LOF

37

from pyod.utils.data import generate_data, evaluate_print

38

39

# Generate sample data

40

X_train, X_test, y_train, y_test = generate_data(

41

n_train=200, n_test=100, n_features=2,

42

contamination=0.1, random_state=42

43

)

44

45

# Initialize and fit detector

46

clf = LOF(contamination=0.1)

47

clf.fit(X_train)

48

49

# Access fitted results

50

y_train_pred = clf.labels_ # Training labels (0: inlier, 1: outlier)

51

y_train_scores = clf.decision_scores_ # Training anomaly scores

52

threshold = clf.threshold_ # Decision threshold

53

54

# Predict on new data

55

y_test_pred = clf.predict(X_test)

56

y_test_scores = clf.decision_function(X_test)

57

y_test_proba = clf.predict_proba(X_test)

58

59

# Evaluate results

60

evaluate_print('LOF', y_test, y_test_scores)

61

```

62

63

## Architecture

64

65

PyOD follows a consistent architecture based on the BaseDetector abstract class:

66

67

- **BaseDetector**: Abstract base class providing unified interface for all detectors

68

- **Fit/Predict Pattern**: scikit-learn compatible interface with fit(), predict(), decision_function()

69

- **Model Categories**: Classical, modern, deep learning, and ensemble methods

70

- **Utility Functions**: Data generation, evaluation metrics, preprocessing, and visualization tools

71

72

All detectors inherit from BaseDetector and implement the same core methods, ensuring consistent behavior across different algorithms. This design enables easy model comparison, ensemble creation, and integration into machine learning pipelines.

73

74

## Capabilities

75

76

### Classical Detection Models

77

78

Traditional outlier detection algorithms including Local Outlier Factor, Isolation Forest, One-Class SVM, k-Nearest Neighbors, and statistical methods. These algorithms form the foundation of anomaly detection with proven effectiveness across various domains.

79

80

```python { .api }

81

class LOF:

82

def __init__(self, n_neighbors=20, algorithm='auto', leaf_size=30,

83

metric='minkowski', p=2, metric_params=None,

84

contamination=0.1, n_jobs=1, novelty=True, **kwargs): ...

85

86

class IForest:

87

def __init__(self, n_estimators=100, max_samples='auto', contamination=0.1, **kwargs): ...

88

89

class OCSVM:

90

def __init__(self, kernel='rbf', degree=3, gamma='scale', contamination=0.1, **kwargs): ...

91

92

class KNN:

93

def __init__(self, contamination=0.1, n_neighbors=5, method='largest', **kwargs): ...

94

```

95

96

[Classical Models](./classical-models.md)

97

98

### Modern Detection Models

99

100

State-of-the-art outlier detection algorithms including ECOD, COPOD, SUOD, and other recent advances. These methods often provide better performance and scalability compared to classical approaches.

101

102

```python { .api }

103

class ECOD:

104

def __init__(self, contamination=0.1, n_jobs=1): ...

105

106

class COPOD:

107

def __init__(self, contamination=0.1, n_jobs=1): ...

108

109

class SUOD:

110

def __init__(self, base_estimators=None, n_jobs=1, contamination=0.1, **kwargs): ...

111

```

112

113

[Modern Models](./modern-models.md)

114

115

### Deep Learning Models

116

117

Neural network-based outlier detection including autoencoders, variational autoencoders, Deep SVDD, and generative adversarial models. These models excel with high-dimensional data and complex patterns.

118

119

```python { .api }

120

class AutoEncoder:

121

def __init__(self, hidden_neurons=[64, 32, 32, 64], contamination=0.1, **kwargs): ...

122

123

class VAE:

124

def __init__(self, encoder_neurons=[32, 16], decoder_neurons=[16, 32], contamination=0.1, **kwargs): ...

125

126

class DeepSVDD:

127

def __init__(self, hidden_neurons=[64, 32], contamination=0.1, **kwargs): ...

128

```

129

130

[Deep Learning Models](./deep-learning-models.md)

131

132

### Ensemble Models

133

134

Combination methods that leverage multiple base detectors to improve detection performance through diversity and aggregation strategies.

135

136

```python { .api }

137

class FeatureBagging:

138

def __init__(self, base_estimator=None, n_estimators=10, contamination=0.1, **kwargs): ...

139

140

class LSCP:

141

def __init__(self, detector_list, local_region_size=30, contamination=0.1, **kwargs): ...

142

```

143

144

[Ensemble Models](./ensemble-models.md)

145

146

### Data Utilities

147

148

Comprehensive utilities for data generation, preprocessing, evaluation, and visualization to support the complete outlier detection workflow.

149

150

```python { .api }

151

def generate_data(n_train=200, n_test=100, n_features=2, contamination=0.1, **kwargs):

152

"""Generate synthetic datasets for outlier detection"""

153

154

def evaluate_print(clf_name, y, y_scores):

155

"""Print comprehensive evaluation metrics"""

156

157

def standardizer(X, X_t=None, method='minmax', keep_scalar=False):

158

"""Standardize datasets using various methods"""

159

```

160

161

[Data Utilities](./data-utilities.md)

162

163

## Core Types

164

165

```python { .api }

166

class BaseDetector:

167

"""Abstract base class for all outlier detection algorithms."""

168

169

def __init__(self, contamination=0.1):

170

"""

171

Parameters:

172

- contamination (float): Proportion of outliers in dataset (0 < contamination <= 0.5)

173

"""

174

175

def fit(self, X, y=None):

176

"""

177

Fit detector on training data.

178

179

Parameters:

180

- X (array-like): Training data of shape (n_samples, n_features)

181

- y: Ignored (present for API consistency)

182

183

Returns:

184

- self: Fitted estimator

185

"""

186

187

def predict(self, X, return_confidence=False):

188

"""

189

Binary prediction on test data.

190

191

Parameters:

192

- X (array-like): Test data of shape (n_samples, n_features)

193

- return_confidence (bool): Whether to return confidence scores

194

195

Returns:

196

- y_pred (array): Binary labels (0: inlier, 1: outlier)

197

"""

198

199

def decision_function(self, X):

200

"""

201

Raw anomaly scores on test data.

202

203

Parameters:

204

- X (array-like): Test data of shape (n_samples, n_features)

205

206

Returns:

207

- scores (array): Anomaly scores (higher = more anomalous)

208

"""

209

210

def predict_proba(self, X, method='linear', return_confidence=False):

211

"""

212

Probability of being an outlier.

213

214

Parameters:

215

- X (array-like): Test data of shape (n_samples, n_features)

216

- method (str): Probability conversion method ('linear' or 'unify')

217

- return_confidence (bool): If True, also return confidence scores

218

219

Returns:

220

- proba (array): Probability matrix of shape (n_samples, 2)

221

"""

222

223

# Fitted attributes (available after calling fit())

224

decision_scores_: array # Outlier scores of training data

225

labels_: array # Binary labels of training data

226

threshold_: float # Decision threshold

227

```