Designs and implements production-grade ML pipeline infrastructure: configures experiment tracking with MLflow or Weights & Biases, creates Kubeflow or Airflow DAGs for training orchestration, builds feature store schemas with Feast, deploys model registries, and automates retraining and validation workflows. Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, managing experiment tracking systems, setting up DVC for data versioning, tuning hyperparameters, or configuring MLOps tooling like Kubeflow, Airflow, MLflow, or Prefect.
93
100%
Does it follow best practices?
Impact
87%
1.12xAverage score across 6 eval scenarios
Passed
No known issues
Experiment tracking and reproducibility
No hardcoded hyperparams
83%
100%
Seed set comprehensively
30%
100%
MLflow tracking used
100%
100%
Required params logged
70%
80%
Required metrics logged
40%
40%
Model artifact logged
100%
100%
Config artifact logged
0%
0%
MLflow tags used
0%
100%
Training/inference separation
75%
100%
training_report.json written
100%
100%
requirements.txt present
100%
100%
Feature engineering pipeline with data validation
sklearn ColumnTransformer used
100%
100%
Numeric imputer present
100%
100%
StandardScaler for numeric
100%
100%
Categorical imputer present
100%
100%
OneHotEncoder for categorical
100%
100%
Data validation before transform
100%
100%
joblib serialization used
0%
100%
Metadata JSON saved alongside pipeline
30%
100%
Descriptive feature naming
60%
50%
pipeline_summary.json written
100%
100%
Pipeline artifact in artifacts/ dir
100%
100%
Pipeline orchestration with deployment controls
Retry config on tasks
100%
100%
Retry delay configured
100%
100%
Conditional deployment
58%
50%
Explicit accuracy threshold
100%
100%
Failure alerting configured
40%
0%
Idempotent stage design
100%
100%
Training/inference separation
100%
100%
No credentials in code
100%
100%
pipeline_design.md present
100%
100%
No silent failure
100%
100%
Execution timeout set
0%
0%
Model registry and version promotion
Model signature inferred
0%
100%
registered_model_name set
100%
100%
Staging stage used
0%
0%
Production promotion
100%
100%
Archive existing on promotion
50%
100%
No promotion without metrics
20%
100%
Validation threshold check
55%
100%
MLflow tags on runs
0%
100%
Model comparison documented
100%
100%
registry_report.json written
100%
100%
requirements.txt present
100%
100%
Hyperparameter tuning with Optuna pruning
Optuna used
100%
100%
MedianPruner configured
100%
100%
Pruner startup trials = 5
100%
0%
Pruner warmup steps = 3
0%
0%
SQLite storage used
100%
100%
load_if_exists=True
100%
100%
trial.report called
100%
100%
trial.should_prune checked
100%
100%
Best params logged to MLflow
100%
100%
best_params.json written
100%
100%
requirements.txt present
100%
100%
Cross-validation and model validation gates
StratifiedKFold used
100%
100%
CV mean and std reported
100%
100%
McNemar's test applied
70%
80%
p-value computed and reported
100%
100%
Explicit ValidationStatus or equivalent
100%
100%
Accuracy threshold defined as variable
100%
100%
No deployment without metrics
100%
100%
Segment-level evaluation present
100%
100%
validation_report.json written
100%
100%
requirements.txt present
100%
100%
Reproducibility: seeds set
100%
100%
5b76101
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.