Tessl Tile for pypi/pyod@2.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-pyod

A comprehensive Python library for detecting anomalous/outlying objects in multivariate data with 45+ algorithms.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pyod@2.0.x

To install, run

npx @tessl/cli install tessl/pypi-pyod@2.0.0

0
# PyOD
1

2
A comprehensive Python library for detecting anomalous/outlying objects in multivariate data. PyOD provides 45+ detection algorithms ranging from classical methods like Local Outlier Factor (LOF) to cutting-edge approaches like ECOD and deep learning models, all with a unified scikit-learn-compatible interface.
3

4
## Package Information
5

6
- **Package Name**: pyod
7
- **Language**: Python
8
- **Installation**: `pip install pyod==2.0.5`
9
- **Documentation**: https://pyod.readthedocs.io/
10
- **License**: BSD 2-Clause
11

12
## Core Imports
13

14
```python
15
import pyod
16
```
17

18
Models are imported directly from individual files:
19

20
```python
21
from pyod.models.lof import LOF
22
from pyod.models.iforest import IForest
23
from pyod.models.ecod import ECOD
24
```
25

26
Utilities:
27

28
```python
29
from pyod.utils.data import generate_data, evaluate_print
30
from pyod.utils.utility import standardizer, score_to_label
31
```
32

33
## Basic Usage
34

35
```python
36
from pyod.models.lof import LOF
37
from pyod.utils.data import generate_data, evaluate_print
38

39
# Generate sample data
40
X_train, X_test, y_train, y_test = generate_data(
41
    n_train=200, n_test=100, n_features=2, 
42
    contamination=0.1, random_state=42
43
)
44

45
# Initialize and fit detector
46
clf = LOF(contamination=0.1)
47
clf.fit(X_train)
48

49
# Access fitted results
50
y_train_pred = clf.labels_          # Training labels (0: inlier, 1: outlier)
51
y_train_scores = clf.decision_scores_  # Training anomaly scores
52
threshold = clf.threshold_          # Decision threshold
53

54
# Predict on new data  
55
y_test_pred = clf.predict(X_test)
56
y_test_scores = clf.decision_function(X_test)
57
y_test_proba = clf.predict_proba(X_test)
58

59
# Evaluate results
60
evaluate_print('LOF', y_test, y_test_scores)
61
```
62

63
## Architecture
64

65
PyOD follows a consistent architecture based on the BaseDetector abstract class:
66

67
- **BaseDetector**: Abstract base class providing unified interface for all detectors
68
- **Fit/Predict Pattern**: scikit-learn compatible interface with fit(), predict(), decision_function()
69
- **Model Categories**: Classical, modern, deep learning, and ensemble methods
70
- **Utility Functions**: Data generation, evaluation metrics, preprocessing, and visualization tools
71

72
All detectors inherit from BaseDetector and implement the same core methods, ensuring consistent behavior across different algorithms. This design enables easy model comparison, ensemble creation, and integration into machine learning pipelines.
73

74
## Capabilities
75

76
### Classical Detection Models
77

78
Traditional outlier detection algorithms including Local Outlier Factor, Isolation Forest, One-Class SVM, k-Nearest Neighbors, and statistical methods. These algorithms form the foundation of anomaly detection with proven effectiveness across various domains.
79

80
```python { .api }
81
class LOF:
82
    def __init__(self, n_neighbors=20, algorithm='auto', leaf_size=30, 
83
                 metric='minkowski', p=2, metric_params=None, 
84
                 contamination=0.1, n_jobs=1, novelty=True, **kwargs): ...
85

86
class IForest:
87
    def __init__(self, n_estimators=100, max_samples='auto', contamination=0.1, **kwargs): ...
88

89
class OCSVM:
90
    def __init__(self, kernel='rbf', degree=3, gamma='scale', contamination=0.1, **kwargs): ...
91

92
class KNN:
93
    def __init__(self, contamination=0.1, n_neighbors=5, method='largest', **kwargs): ...
94
```
95

96
[Classical Models](./classical-models.md)
97

98
### Modern Detection Models
99

100
State-of-the-art outlier detection algorithms including ECOD, COPOD, SUOD, and other recent advances. These methods often provide better performance and scalability compared to classical approaches.
101

102
```python { .api }
103
class ECOD:
104
    def __init__(self, contamination=0.1, n_jobs=1): ...
105

106
class COPOD:
107
    def __init__(self, contamination=0.1, n_jobs=1): ...
108

109
class SUOD:
110
    def __init__(self, base_estimators=None, n_jobs=1, contamination=0.1, **kwargs): ...
111
```
112

113
[Modern Models](./modern-models.md)
114

115
### Deep Learning Models
116

117
Neural network-based outlier detection including autoencoders, variational autoencoders, Deep SVDD, and generative adversarial models. These models excel with high-dimensional data and complex patterns.
118

119
```python { .api }
120
class AutoEncoder:
121
    def __init__(self, hidden_neurons=[64, 32, 32, 64], contamination=0.1, **kwargs): ...
122

123
class VAE:
124
    def __init__(self, encoder_neurons=[32, 16], decoder_neurons=[16, 32], contamination=0.1, **kwargs): ...
125

126
class DeepSVDD:
127
    def __init__(self, hidden_neurons=[64, 32], contamination=0.1, **kwargs): ...
128
```
129

130
[Deep Learning Models](./deep-learning-models.md)
131

132
### Ensemble Models
133

134
Combination methods that leverage multiple base detectors to improve detection performance through diversity and aggregation strategies.
135

136
```python { .api }
137
class FeatureBagging:
138
    def __init__(self, base_estimator=None, n_estimators=10, contamination=0.1, **kwargs): ...
139

140
class LSCP:
141
    def __init__(self, detector_list, local_region_size=30, contamination=0.1, **kwargs): ...
142
```
143

144
[Ensemble Models](./ensemble-models.md)
145

146
### Data Utilities
147

148
Comprehensive utilities for data generation, preprocessing, evaluation, and visualization to support the complete outlier detection workflow.
149

150
```python { .api }
151
def generate_data(n_train=200, n_test=100, n_features=2, contamination=0.1, **kwargs):
152
    """Generate synthetic datasets for outlier detection"""
153

154
def evaluate_print(clf_name, y, y_scores):
155
    """Print comprehensive evaluation metrics"""
156

157
def standardizer(X, X_t=None, method='minmax', keep_scalar=False):
158
    """Standardize datasets using various methods"""
159
```
160

161
[Data Utilities](./data-utilities.md)
162

163
## Core Types
164

165
```python { .api }
166
class BaseDetector:
167
    """Abstract base class for all outlier detection algorithms."""
168
    
169
    def __init__(self, contamination=0.1):
170
        """
171
        Parameters:
172
        - contamination (float): Proportion of outliers in dataset (0 < contamination <= 0.5)
173
        """
174
    
175
    def fit(self, X, y=None):
176
        """
177
        Fit detector on training data.
178
        
179
        Parameters:
180
        - X (array-like): Training data of shape (n_samples, n_features)
181
        - y: Ignored (present for API consistency)
182
        
183
        Returns:
184
        - self: Fitted estimator
185
        """
186
    
187
    def predict(self, X, return_confidence=False):
188
        """
189
        Binary prediction on test data.
190
        
191
        Parameters:
192
        - X (array-like): Test data of shape (n_samples, n_features)
193
        - return_confidence (bool): Whether to return confidence scores
194
        
195
        Returns:
196
        - y_pred (array): Binary labels (0: inlier, 1: outlier)
197
        """
198
    
199
    def decision_function(self, X):
200
        """
201
        Raw anomaly scores on test data.
202
        
203
        Parameters:
204
        - X (array-like): Test data of shape (n_samples, n_features)
205
        
206
        Returns:
207
        - scores (array): Anomaly scores (higher = more anomalous)
208
        """
209
    
210
    def predict_proba(self, X, method='linear', return_confidence=False):
211
        """
212
        Probability of being an outlier.
213
        
214
        Parameters:
215
        - X (array-like): Test data of shape (n_samples, n_features)  
216
        - method (str): Probability conversion method ('linear' or 'unify')
217
        - return_confidence (bool): If True, also return confidence scores
218
        
219
        Returns:
220
        - proba (array): Probability matrix of shape (n_samples, 2)
221
        """
222
    
223
    # Fitted attributes (available after calling fit())
224
    decision_scores_: array  # Outlier scores of training data
225
    labels_: array          # Binary labels of training data  
226
    threshold_: float       # Decision threshold
227
```