Process identify anomalies and outliers in datasets using machine learning algorithms. Use when analyzing data for unusual patterns, outliers, or unexpected deviations from normal behavior. Trigger with phrases like "detect anomalies", "find outliers", or "identify unusual patterns".
74
70%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/ai-ml/anomaly-detection-system/skills/detecting-data-anomalies/SKILL.mdIdentify anomalies and outliers in datasets using statistical and machine learning algorithms including Isolation Forest, One-Class SVM, Local Outlier Factor, and autoencoders. This skill handles the full detection pipeline from data ingestion and feature scaling through algorithm selection, threshold tuning, and result interpretation with anomaly scoring.
pip install scikit-learn)pip install pandas numpy)pip install matplotlib seaborn)See ${CLAUDE_SKILL_DIR}/references/implementation.md for the detailed implementation guide.
| Error | Cause | Solution |
|---|---|---|
| Insufficient data volume | Fewer than 100 data points for model fitting | Collect additional data or switch to simple statistical methods (z-score, IQR) |
| High false positive rate | Contamination parameter set too high or features not scaled | Lower contamination to 0.01; verify StandardScaler applied; refine feature selection |
| Algorithm OOM on large dataset | Isolation Forest or LOF exceeds available memory | Subsample data for training; use max_samples parameter; switch to streaming approach |
| Feature scaling mismatch | Mixed numeric and categorical features without proper encoding | One-hot encode categoricals separately; scale numeric features independently |
| No ground truth for validation | Unlabeled dataset prevents accuracy measurement | Use domain expert review on top-N anomalies; implement feedback loop to refine threshold |
See ${CLAUDE_SKILL_DIR}/references/errors.md for the full error reference.
Scenario 1: Network Intrusion Detection -- Apply Isolation Forest to 50K network flow records with features: packet count, byte volume, duration, protocol type. Expected contamination: 2%. Target: flag port-scan and DDoS patterns with precision above 0.85.
Scenario 2: Manufacturing Quality Control -- Run LOF on sensor readings (temperature, vibration, pressure) from 10K production cycles. Detect equipment degradation anomalies. Visualize flagged cycles on a time-series plot with normal operating bands.
Scenario 3: Financial Transaction Monitoring -- Train an autoencoder on 100K legitimate transactions. Reconstruct test transactions and flag those with reconstruction error above the 99th percentile. Report flagged transactions with amount, merchant category, and time-of-day features.
3a2d27d
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.