Designs and implements production-grade ML pipeline infrastructure: configures experiment tracking with MLflow or Weights & Biases, creates Kubeflow or Airflow DAGs for training orchestration, builds feature store schemas with Feast, deploys model registries, and automates retraining and validation workflows. Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, managing experiment tracking systems, setting up DVC for data versioning, tuning hyperparameters, or configuring MLOps tooling like Kubeflow, Airflow, MLflow, or Prefect.
Install with Tessl CLI
npx tessl i github:jeffallan/claude-skills --skill ml-pipeline93
Does it follow best practices?
Evaluation — 87%
↑ 1.12xAgent success when using this skill
Validation for skill structure
Experiment tracking and reproducibility
No hardcoded hyperparams
83%
100%
Seed set comprehensively
30%
100%
MLflow tracking used
100%
100%
Required params logged
70%
80%
Required metrics logged
40%
40%
Model artifact logged
100%
100%
Config artifact logged
0%
0%
MLflow tags used
0%
100%
Training/inference separation
75%
100%
training_report.json written
100%
100%
requirements.txt present
100%
100%
Without context: $0.5319 · 2s · 1 turns · 3 in / 30 out tokens
With context: $1.1610 · 7s · 2 turns · 4 in / 266 out tokens
Feature engineering pipeline with data validation
sklearn ColumnTransformer used
100%
100%
Numeric imputer present
100%
100%
StandardScaler for numeric
100%
100%
Categorical imputer present
100%
100%
OneHotEncoder for categorical
100%
100%
Data validation before transform
100%
100%
joblib serialization used
0%
100%
Metadata JSON saved alongside pipeline
30%
100%
Descriptive feature naming
60%
50%
pipeline_summary.json written
100%
100%
Pipeline artifact in artifacts/ dir
100%
100%
Without context: $0.3976 · 1m 39s · 19 turns · 22 in / 5,970 out tokens
With context: $1.3815 · 4m 2s · 48 turns · 299 in / 14,747 out tokens
Pipeline orchestration with deployment controls
Retry config on tasks
100%
100%
Retry delay configured
100%
100%
Conditional deployment
58%
50%
Explicit accuracy threshold
100%
100%
Failure alerting configured
40%
0%
Idempotent stage design
100%
100%
Training/inference separation
100%
100%
No credentials in code
100%
100%
pipeline_design.md present
100%
100%
No silent failure
100%
100%
Execution timeout set
0%
0%
Without context: $0.6523 · 4m 3s · 27 turns · 33 in / 10,317 out tokens
With context: $1.0029 · 3m 10s · 28 turns · 9,142 in / 11,286 out tokens
Model registry and version promotion
Model signature inferred
0%
100%
registered_model_name set
100%
100%
Staging stage used
0%
0%
Production promotion
100%
100%
Archive existing on promotion
50%
100%
No promotion without metrics
20%
100%
Validation threshold check
55%
100%
MLflow tags on runs
0%
100%
Model comparison documented
100%
100%
registry_report.json written
100%
100%
requirements.txt present
100%
100%
Without context: $0.4536 · 2m 11s · 19 turns · 21 in / 7,226 out tokens
With context: $1.7841 · 6m 23s · 39 turns · 73 in / 21,745 out tokens
Hyperparameter tuning with Optuna pruning
Optuna used
100%
100%
MedianPruner configured
100%
100%
Pruner startup trials = 5
100%
0%
Pruner warmup steps = 3
0%
0%
SQLite storage used
100%
100%
load_if_exists=True
100%
100%
trial.report called
100%
100%
trial.should_prune checked
100%
100%
Best params logged to MLflow
100%
100%
best_params.json written
100%
100%
requirements.txt present
100%
100%
Without context: $0.7792 · 3m 26s · 33 turns · 40 in / 10,326 out tokens
With context: $1.6936 · 4m 48s · 43 turns · 9,000 in / 15,698 out tokens
Cross-validation and model validation gates
StratifiedKFold used
100%
100%
CV mean and std reported
100%
100%
McNemar's test applied
70%
80%
p-value computed and reported
100%
100%
Explicit ValidationStatus or equivalent
100%
100%
Accuracy threshold defined as variable
100%
100%
No deployment without metrics
100%
100%
Segment-level evaluation present
100%
100%
validation_report.json written
100%
100%
requirements.txt present
100%
100%
Reproducibility: seeds set
100%
100%
Without context: $0.7786 · 3m 4s · 30 turns · 592 in / 12,318 out tokens
With context: $1.2506 · 4m 10s · 31 turns · 281 in / 17,534 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.