A comprehensive Python library for detecting anomalous/outlying objects in multivariate data with 45+ algorithms.
—
State-of-the-art outlier detection algorithms that often provide better performance and scalability compared to classical approaches. These methods incorporate recent advances in machine learning and statistical theory.
A parameter-free, highly interpretable outlier detection algorithm based on empirical cumulative distribution functions. ECOD is efficient, robust, and provides excellent performance across various datasets.
class ECOD:
def __init__(self, contamination=0.1, n_jobs=1):
"""
Parameters:
- contamination (float): Proportion of outliers in dataset
- n_jobs (int): Number of parallel jobs for computation
"""Usage example:
from pyod.models.ecod import ECOD
from pyod.utils.data import generate_data
X_train, X_test, y_train, y_test = generate_data(contamination=0.1, random_state=42)
clf = ECOD(contamination=0.1, n_jobs=2)
clf.fit(X_train)
y_pred = clf.predict(X_test)
scores = clf.decision_function(X_test)Uses copula functions to model the dependence structure among features, providing a robust approach to outlier detection that captures complex relationships between variables.
class COPOD:
def __init__(self, contamination=0.1, n_jobs=1):
"""
Parameters:
- contamination (float): Proportion of outliers in dataset
- n_jobs (int): Number of parallel jobs for computation
"""A framework for accelerating outlier detection by using multiple base estimators with approximate methods. Provides significant speedup while maintaining detection quality.
class SUOD:
def __init__(self, base_estimators=None, n_jobs=1, rp_clf_list=None,
rp_ng_clf_list=None, rp_flag_global=True, jl_method='basic',
jl_proj_nums=None, cost_forecast_loc_fit=None,
cost_forecast_loc_pred=None, approx_flag_global=False,
approx_clf_list=None, approx_ng_clf_list=None,
contamination=0.1, combination='average', verbose=False,
random_state=None):
"""
Parameters:
- base_estimators (list): List of base detectors
- n_jobs (int): Number of parallel jobs
- rp_clf_list (list): List of detectors for random projection
- jl_method (str): Johnson-Lindenstrauss method ('basic', 'discrete', 'circulant')
- contamination (float): Proportion of outliers in dataset
- combination (str): Combination method for scores ('average', 'maximization')
- verbose (bool): Whether to print progress information
"""A novel approach that combines regression techniques with uncertainty quantification for robust outlier detection, particularly effective for datasets with complex patterns.
class LUNAR:
def __init__(self, model_type='regressor', n_neighbours=5,
negative_sampling=1, val_size=0.1, scaler='MinMaxScaler',
contamination=0.1):
"""
Parameters:
- model_type (str): Type of base model ('regressor', 'classifier')
- n_neighbours (int): Number of neighbors for local modeling
- negative_sampling (int): Negative sampling ratio
- val_size (float): Validation set size fraction
- scaler (str): Scaler type for preprocessing
- contamination (float): Proportion of outliers in dataset
"""Detects outliers based on the relative deviation of data points from their local neighborhoods, providing good performance on datasets with complex structures.
class LMDD:
def __init__(self, contamination=0.1, n_iter=50, dis_measure='aad',
random_state=None):
"""
Parameters:
- contamination (float): Proportion of outliers in dataset
- n_iter (int): Number of iterations for optimization
- dis_measure (str): Distance measure ('aad', 'var', 'iqr')
- random_state (int): Random number generator seed
"""A fast, online outlier detection algorithm that uses sparse random projections. Effective for high-dimensional data and streaming applications.
class LODA:
def __init__(self, contamination=0.1, n_bins=10, n_random_cuts=100):
"""
Parameters:
- contamination (float): Proportion of outliers in dataset
- n_bins (int): Number of bins for histogram
- n_random_cuts (int): Number of random projections
"""Combines the benefits of isolation-based methods with nearest neighbor approaches, providing robust detection across various data distributions.
class INNE:
def __init__(self, n_estimators=200, max_samples=256, contamination=0.1,
random_state=None):
"""
Parameters:
- n_estimators (int): Number of estimators in ensemble
- max_samples (int): Maximum number of samples per estimator
- contamination (float): Proportion of outliers in dataset
- random_state (int): Random number generator seed
"""Detects outliers in relevant subspaces rather than the full feature space, making it effective for high-dimensional data where outliers may only be visible in certain dimensions.
class SOD:
def __init__(self, n_neighbors=20, ref_set=10, alpha=0.8, contamination=0.1):
"""
Parameters:
- n_neighbors (int): Number of neighbors to consider
- ref_set (int): Size of reference set
- alpha (float): Weight parameter for subspace selection
- contamination (float): Proportion of outliers in dataset
"""Uses stochastic methods to compute outlier probabilities, providing uncertainty estimates along with outlier scores.
class SOS:
def __init__(self, perplexity=4.5, metric='euclidean', eps=1e-5,
contamination=0.1):
"""
Parameters:
- perplexity (float): Perplexity parameter for probability computation
- metric (str): Distance metric to use
- eps (float): Numerical stability parameter
- contamination (float): Proportion of outliers in dataset
"""Generates diverse feature representations through random rotations and combines multiple detectors for improved robustness.
class ROD:
def __init__(self, base_estimator=None, n_estimators=100,
max_features=1.0, contamination=0.1, random_state=None):
"""
Parameters:
- base_estimator: Base detector to use
- n_estimators (int): Number of estimators
- max_features (float): Fraction of features to use
- contamination (float): Proportion of outliers in dataset
- random_state (int): Random number generator seed
"""class LOCI:
"""Local Correlation Integral"""
def __init__(self, contamination=0.1, alpha=0.5, k=3): ...
class CD:
"""Cook's Distance"""
def __init__(self, contamination=0.1, whitening=True): ...
class QMCD:
"""Quasi-Monte Carlo Discrepancy"""
def __init__(self, contamination=0.1, ref_set=10): ...
class Sampling:
"""Sampling-based outlier detection"""
def __init__(self, contamination=0.1, subset_size=20, metric='euclidean'): ...Modern models follow the same interface as classical models:
# Example with ECOD
from pyod.models.ecod import ECOD
from pyod.utils.data import generate_data
# Generate data
X_train, X_test, y_train, y_test = generate_data(
n_train=500, n_test=200, contamination=0.1, random_state=42
)
# Initialize and fit
clf = ECOD(contamination=0.1, n_jobs=2)
clf.fit(X_train)
# Get results
train_scores = clf.decision_scores_
test_scores = clf.decision_function(X_test)
test_labels = clf.predict(X_test)Install with Tessl CLI
npx tessl i tessl/pypi-pyod