0
# PyOD
1
2
A comprehensive Python library for detecting anomalous/outlying objects in multivariate data. PyOD provides 45+ detection algorithms ranging from classical methods like Local Outlier Factor (LOF) to cutting-edge approaches like ECOD and deep learning models, all with a unified scikit-learn-compatible interface.
3
4
## Package Information
5
6
- **Package Name**: pyod
7
- **Language**: Python
8
- **Installation**: `pip install pyod==2.0.5`
9
- **Documentation**: https://pyod.readthedocs.io/
10
- **License**: BSD 2-Clause
11
12
## Core Imports
13
14
```python
15
import pyod
16
```
17
18
Models are imported directly from individual files:
19
20
```python
21
from pyod.models.lof import LOF
22
from pyod.models.iforest import IForest
23
from pyod.models.ecod import ECOD
24
```
25
26
Utilities:
27
28
```python
29
from pyod.utils.data import generate_data, evaluate_print
30
from pyod.utils.utility import standardizer, score_to_label
31
```
32
33
## Basic Usage
34
35
```python
36
from pyod.models.lof import LOF
37
from pyod.utils.data import generate_data, evaluate_print
38
39
# Generate sample data
40
X_train, X_test, y_train, y_test = generate_data(
41
n_train=200, n_test=100, n_features=2,
42
contamination=0.1, random_state=42
43
)
44
45
# Initialize and fit detector
46
clf = LOF(contamination=0.1)
47
clf.fit(X_train)
48
49
# Access fitted results
50
y_train_pred = clf.labels_ # Training labels (0: inlier, 1: outlier)
51
y_train_scores = clf.decision_scores_ # Training anomaly scores
52
threshold = clf.threshold_ # Decision threshold
53
54
# Predict on new data
55
y_test_pred = clf.predict(X_test)
56
y_test_scores = clf.decision_function(X_test)
57
y_test_proba = clf.predict_proba(X_test)
58
59
# Evaluate results
60
evaluate_print('LOF', y_test, y_test_scores)
61
```
62
63
## Architecture
64
65
PyOD follows a consistent architecture based on the BaseDetector abstract class:
66
67
- **BaseDetector**: Abstract base class providing unified interface for all detectors
68
- **Fit/Predict Pattern**: scikit-learn compatible interface with fit(), predict(), decision_function()
69
- **Model Categories**: Classical, modern, deep learning, and ensemble methods
70
- **Utility Functions**: Data generation, evaluation metrics, preprocessing, and visualization tools
71
72
All detectors inherit from BaseDetector and implement the same core methods, ensuring consistent behavior across different algorithms. This design enables easy model comparison, ensemble creation, and integration into machine learning pipelines.
73
74
## Capabilities
75
76
### Classical Detection Models
77
78
Traditional outlier detection algorithms including Local Outlier Factor, Isolation Forest, One-Class SVM, k-Nearest Neighbors, and statistical methods. These algorithms form the foundation of anomaly detection with proven effectiveness across various domains.
79
80
```python { .api }
81
class LOF:
82
def __init__(self, n_neighbors=20, algorithm='auto', leaf_size=30,
83
metric='minkowski', p=2, metric_params=None,
84
contamination=0.1, n_jobs=1, novelty=True, **kwargs): ...
85
86
class IForest:
87
def __init__(self, n_estimators=100, max_samples='auto', contamination=0.1, **kwargs): ...
88
89
class OCSVM:
90
def __init__(self, kernel='rbf', degree=3, gamma='scale', contamination=0.1, **kwargs): ...
91
92
class KNN:
93
def __init__(self, contamination=0.1, n_neighbors=5, method='largest', **kwargs): ...
94
```
95
96
[Classical Models](./classical-models.md)
97
98
### Modern Detection Models
99
100
State-of-the-art outlier detection algorithms including ECOD, COPOD, SUOD, and other recent advances. These methods often provide better performance and scalability compared to classical approaches.
101
102
```python { .api }
103
class ECOD:
104
def __init__(self, contamination=0.1, n_jobs=1): ...
105
106
class COPOD:
107
def __init__(self, contamination=0.1, n_jobs=1): ...
108
109
class SUOD:
110
def __init__(self, base_estimators=None, n_jobs=1, contamination=0.1, **kwargs): ...
111
```
112
113
[Modern Models](./modern-models.md)
114
115
### Deep Learning Models
116
117
Neural network-based outlier detection including autoencoders, variational autoencoders, Deep SVDD, and generative adversarial models. These models excel with high-dimensional data and complex patterns.
118
119
```python { .api }
120
class AutoEncoder:
121
def __init__(self, hidden_neurons=[64, 32, 32, 64], contamination=0.1, **kwargs): ...
122
123
class VAE:
124
def __init__(self, encoder_neurons=[32, 16], decoder_neurons=[16, 32], contamination=0.1, **kwargs): ...
125
126
class DeepSVDD:
127
def __init__(self, hidden_neurons=[64, 32], contamination=0.1, **kwargs): ...
128
```
129
130
[Deep Learning Models](./deep-learning-models.md)
131
132
### Ensemble Models
133
134
Combination methods that leverage multiple base detectors to improve detection performance through diversity and aggregation strategies.
135
136
```python { .api }
137
class FeatureBagging:
138
def __init__(self, base_estimator=None, n_estimators=10, contamination=0.1, **kwargs): ...
139
140
class LSCP:
141
def __init__(self, detector_list, local_region_size=30, contamination=0.1, **kwargs): ...
142
```
143
144
[Ensemble Models](./ensemble-models.md)
145
146
### Data Utilities
147
148
Comprehensive utilities for data generation, preprocessing, evaluation, and visualization to support the complete outlier detection workflow.
149
150
```python { .api }
151
def generate_data(n_train=200, n_test=100, n_features=2, contamination=0.1, **kwargs):
152
"""Generate synthetic datasets for outlier detection"""
153
154
def evaluate_print(clf_name, y, y_scores):
155
"""Print comprehensive evaluation metrics"""
156
157
def standardizer(X, X_t=None, method='minmax', keep_scalar=False):
158
"""Standardize datasets using various methods"""
159
```
160
161
[Data Utilities](./data-utilities.md)
162
163
## Core Types
164
165
```python { .api }
166
class BaseDetector:
167
"""Abstract base class for all outlier detection algorithms."""
168
169
def __init__(self, contamination=0.1):
170
"""
171
Parameters:
172
- contamination (float): Proportion of outliers in dataset (0 < contamination <= 0.5)
173
"""
174
175
def fit(self, X, y=None):
176
"""
177
Fit detector on training data.
178
179
Parameters:
180
- X (array-like): Training data of shape (n_samples, n_features)
181
- y: Ignored (present for API consistency)
182
183
Returns:
184
- self: Fitted estimator
185
"""
186
187
def predict(self, X, return_confidence=False):
188
"""
189
Binary prediction on test data.
190
191
Parameters:
192
- X (array-like): Test data of shape (n_samples, n_features)
193
- return_confidence (bool): Whether to return confidence scores
194
195
Returns:
196
- y_pred (array): Binary labels (0: inlier, 1: outlier)
197
"""
198
199
def decision_function(self, X):
200
"""
201
Raw anomaly scores on test data.
202
203
Parameters:
204
- X (array-like): Test data of shape (n_samples, n_features)
205
206
Returns:
207
- scores (array): Anomaly scores (higher = more anomalous)
208
"""
209
210
def predict_proba(self, X, method='linear', return_confidence=False):
211
"""
212
Probability of being an outlier.
213
214
Parameters:
215
- X (array-like): Test data of shape (n_samples, n_features)
216
- method (str): Probability conversion method ('linear' or 'unify')
217
- return_confidence (bool): If True, also return confidence scores
218
219
Returns:
220
- proba (array): Probability matrix of shape (n_samples, 2)
221
"""
222
223
# Fitted attributes (available after calling fit())
224
decision_scores_: array # Outlier scores of training data
225
labels_: array # Binary labels of training data
226
threshold_: float # Decision threshold
227
```