World-class senior data scientist skill specialising in statistical modeling, experiment design, causal inference, and predictive analytics. Covers A/B testing (sample sizing, two-proportion z-tests, Bonferroni correction), difference-in-differences, feature engineering pipelines (Scikit-learn, XGBoost), cross-validated model evaluation (AUC-ROC, AUC-PR, SHAP), and MLflow experiment tracking — using Python (NumPy, Pandas, Scikit-learn), R, and SQL. Use when designing or analysing controlled experiments, building and evaluating classification or regression models, performing causal analysis on observational data, engineering features for structured tabular datasets, or translating statistical findings into data-driven business decisions.
81
88%
Does it follow best practices?
Impact
73%
1.25xAverage score across 6 eval scenarios
Passed
No known issues
Experiment design with provided scripts
Uses experiment_designer script
0%
0%
Correct script flags
0%
0%
Power analysis script has type hints
100%
0%
Alpha = 0.05 used
100%
100%
Power = 0.80 used
100%
100%
Monitoring plan present
100%
100%
MLflow or W&B mentioned
0%
0%
Uptime/error rate target
0%
0%
Latency SLO referenced
0%
100%
Scikit-learn or statsmodels used
0%
100%
Batch processing mentioned
0%
0%
Feature engineering pipeline with reliability patterns
Uses feature pipeline script
0%
0%
Correct script flags
0%
0%
Type hints in pipeline
100%
100%
Batch processing design
25%
100%
Retry logic present
0%
0%
Circuit breaker or failure design
28%
71%
Data quality validation
100%
100%
Comprehensive tests written
100%
100%
Pandas or NumPy used
100%
100%
Comprehensive logging
100%
100%
10x scalability noted
0%
0%
Feature catalog complete
100%
100%
Model evaluation with security and monitoring
Uses model eval script
0%
0%
Correct script flags
0%
0%
PII anonymization addressed
100%
100%
Data encryption addressed
100%
100%
GDPR/CCPA compliance
100%
100%
Latency SLOs specified
100%
100%
Error rate target specified
100%
100%
MLflow or W&B for tracking
0%
100%
Canary or feature flag deployment
100%
100%
Type hints in eval script
0%
100%
Comprehensive logging in code
0%
0%
SSN/PII not logged raw
100%
100%
A/B test multi-metric analysis with corrections
Two-proportion z-test
100%
100%
Lift reported
100%
100%
Confidence interval reported
100%
100%
Bonferroni correction applied
0%
100%
Corrected alpha value
0%
100%
Sample ratio mismatch check
100%
100%
SRM threshold referenced
0%
100%
User-level randomization noted
25%
0%
Business cycle duration concern
0%
0%
Primary metric identified
100%
100%
scipy or statsmodels used
100%
100%
Causal inference with difference-in-differences
statsmodels formula API
100%
100%
Interaction term in formula
100%
100%
HC3 robust standard errors
0%
100%
Clustered standard errors
100%
100%
Parallel trends validation
100%
100%
ATT reported
100%
100%
Confidence interval for ATT
100%
100%
Propensity score matching considered
100%
0%
Baseline group comparison
100%
100%
Not just p-value
100%
100%
pandas used for data
100%
100%
Feature engineering specifics and imbalanced classification
Log-transform applied
100%
100%
High-cardinality target encoding
0%
20%
Cyclical time feature (sin/cos)
0%
100%
is_weekend feature
0%
100%
Fit on train only
40%
100%
Lag features before split
100%
100%
Feature business meaning documented
100%
100%
AUC-PR reported
100%
100%
AUC-ROC reported
100%
100%
DummyClassifier baseline
100%
100%
SHAP values computed
0%
100%
StratifiedKFold used
100%
100%
967fe01
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.