CtrlK
BlogDocsLog inGet started
Tessl Logo

ml-pipeline

Designs and implements production-grade ML pipeline infrastructure: configures experiment tracking with MLflow or Weights & Biases, creates Kubeflow or Airflow DAGs for training orchestration, builds feature store schemas with Feast, deploys model registries, and automates retraining and validation workflows. Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, managing experiment tracking systems, setting up DVC for data versioning, tuning hyperparameters, or configuring MLOps tooling like Kubeflow, Airflow, MLflow, or Prefect.

Install with Tessl CLI

npx tessl i github:jeffallan/claude-skills --skill ml-pipeline
What are skills?

93

1.12x

Does it follow best practices?

Evaluation87%

1.12x

Agent success when using this skill

Validation for skill structure

SKILL.md
Review
Evals

Evaluation results

84%

20%

Customer Churn Prediction Training Script

Experiment tracking and reproducibility

Criteria
Without context
With context

No hardcoded hyperparams

83%

100%

Seed set comprehensively

30%

100%

MLflow tracking used

100%

100%

Required params logged

70%

80%

Required metrics logged

40%

40%

Model artifact logged

100%

100%

Config artifact logged

0%

0%

MLflow tags used

0%

100%

Training/inference separation

75%

100%

training_report.json written

100%

100%

requirements.txt present

100%

100%

Without context: $0.5319 · 2s · 1 turns · 3 in / 30 out tokens

With context: $1.1610 · 7s · 2 turns · 4 in / 266 out tokens

95%

16%

E-Commerce Feature Engineering Pipeline

Feature engineering pipeline with data validation

Criteria
Without context
With context

sklearn ColumnTransformer used

100%

100%

Numeric imputer present

100%

100%

StandardScaler for numeric

100%

100%

Categorical imputer present

100%

100%

OneHotEncoder for categorical

100%

100%

Data validation before transform

100%

100%

joblib serialization used

0%

100%

Metadata JSON saved alongside pipeline

30%

100%

Descriptive feature naming

60%

50%

pipeline_summary.json written

100%

100%

Pipeline artifact in artifacts/ dir

100%

100%

Without context: $0.3976 · 1m 39s · 19 turns · 22 in / 5,970 out tokens

With context: $1.3815 · 4m 2s · 48 turns · 299 in / 14,747 out tokens

74%

-5%

Automated Fraud Detection Retraining Pipeline

Pipeline orchestration with deployment controls

Criteria
Without context
With context

Retry config on tasks

100%

100%

Retry delay configured

100%

100%

Conditional deployment

58%

50%

Explicit accuracy threshold

100%

100%

Failure alerting configured

40%

0%

Idempotent stage design

100%

100%

Training/inference separation

100%

100%

No credentials in code

100%

100%

pipeline_design.md present

100%

100%

No silent failure

100%

100%

Execution timeout set

0%

0%

Without context: $0.6523 · 4m 3s · 27 turns · 33 in / 10,317 out tokens

With context: $1.0029 · 3m 10s · 28 turns · 9,142 in / 11,286 out tokens

91%

35%

Model Promotion Workflow

Model registry and version promotion

Criteria
Without context
With context

Model signature inferred

0%

100%

registered_model_name set

100%

100%

Staging stage used

0%

0%

Production promotion

100%

100%

Archive existing on promotion

50%

100%

No promotion without metrics

20%

100%

Validation threshold check

55%

100%

MLflow tags on runs

0%

100%

Model comparison documented

100%

100%

registry_report.json written

100%

100%

requirements.txt present

100%

100%

Without context: $0.4536 · 2m 11s · 19 turns · 21 in / 7,226 out tokens

With context: $1.7841 · 6m 23s · 39 turns · 73 in / 21,745 out tokens

82%

-9%

Automated Hyperparameter Search for Credit Scoring Model

Hyperparameter tuning with Optuna pruning

Criteria
Without context
With context

Optuna used

100%

100%

MedianPruner configured

100%

100%

Pruner startup trials = 5

100%

0%

Pruner warmup steps = 3

0%

0%

SQLite storage used

100%

100%

load_if_exists=True

100%

100%

trial.report called

100%

100%

trial.should_prune checked

100%

100%

Best params logged to MLflow

100%

100%

best_params.json written

100%

100%

requirements.txt present

100%

100%

Without context: $0.7792 · 3m 26s · 33 turns · 40 in / 10,326 out tokens

With context: $1.6936 · 4m 48s · 43 turns · 9,000 in / 15,698 out tokens

98%

1%

New Model Validation Before Production Deployment

Cross-validation and model validation gates

Criteria
Without context
With context

StratifiedKFold used

100%

100%

CV mean and std reported

100%

100%

McNemar's test applied

70%

80%

p-value computed and reported

100%

100%

Explicit ValidationStatus or equivalent

100%

100%

Accuracy threshold defined as variable

100%

100%

No deployment without metrics

100%

100%

Segment-level evaluation present

100%

100%

validation_report.json written

100%

100%

requirements.txt present

100%

100%

Reproducibility: seeds set

100%

100%

Without context: $0.7786 · 3m 4s · 30 turns · 592 in / 12,318 out tokens

With context: $1.2506 · 4m 10s · 31 turns · 281 in / 17,534 out tokens

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.