ml-pipeline-workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

0.98x

Quality

37%

Does it follow best practices?

Impact

73%

0.98x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./tests/ext_conformance/artifacts/agents-wshobson/machine-learning-ops/skills/ml-pipeline-workflow/SKILL.md

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is structurally sound with a clear 'what' and explicit 'when' clause, which is its strongest aspect. However, it operates at a high level of abstraction—listing pipeline phases rather than concrete actions—and could benefit from more specific trigger terms and concrete capabilities to better differentiate it from adjacent skills in data science, DevOps, or data engineering.

Suggestions

Add more specific concrete actions such as 'configure experiment tracking, set up model registries, implement A/B testing, create feature stores, build CI/CD for models'.

Expand trigger terms in the 'Use when' clause to include natural user phrases like 'machine learning', 'model registry', 'experiment tracking', 'model serving', 'ML CI/CD', or specific tools like 'MLflow', 'Kubeflow'.

Dimension	Reasoning	Score
Specificity	Names the domain (MLOps) and lists high-level stages (data preparation, model training, validation, deployment), but these are broad phases rather than multiple specific concrete actions like 'fill forms, merge documents'. It doesn't specify concrete tools, formats, or granular operations.	2 / 3
Completeness	Clearly answers both 'what' (build end-to-end MLOps pipelines from data preparation through deployment) and 'when' with an explicit 'Use when...' clause covering ML pipelines, MLOps practices, and automating training/deployment workflows.	3 / 3
Trigger Term Quality	Includes relevant terms like 'ML pipelines', 'MLOps', 'model training', 'deployment workflows', but misses common user variations such as 'machine learning', 'CI/CD for models', 'model serving', 'experiment tracking', 'feature engineering', or specific framework names users might mention.	2 / 3
Distinctiveness Conflict Risk	The MLOps focus provides some distinctiveness, but terms like 'data preparation', 'model training', and 'deployment' could overlap with general data science skills, deployment/DevOps skills, or data engineering skills. The scope is broad enough to potentially conflict with more specialized skills.	2 / 3
	Total	9 / 12 Passed

Implementation

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is essentially a high-level outline or table of contents for MLOps concepts, not an actionable skill. It describes what an ML pipeline should contain rather than showing how to build one, with virtually no executable code, no concrete validation steps, and extensive padding with information Claude already knows. The referenced supporting files don't exist, leaving the skill as an empty framework.

Suggestions

Replace the descriptive lists with a single concrete, executable end-to-end pipeline example (e.g., a working Airflow DAG or Dagster pipeline with real code for each stage).

Add explicit validation checkpoints with concrete commands, such as data validation with Great Expectations code, model validation with specific metric thresholds, and pre-deployment checks.

Remove sections that describe concepts Claude already knows (tool listings, generic best practices, 'When to Use' section) and replace with specific, opinionated guidance unique to this skill.

Either provide the referenced bundle files (data-preparation.md, model-training.md, etc.) with real content, or inline the critical actionable content directly in the SKILL.md.

Dimension	Reasoning	Score
Conciseness	Extremely verbose and padded with information Claude already knows. Lists of orchestration tools, deployment platforms, experiment tracking tools, and best practices like 'modularity' and 'idempotency' are all general knowledge. The 'When to Use This Skill' and 'What This Skill Provides' sections are meta-descriptions that waste tokens describing the skill rather than teaching it. The entire document reads like a table of contents or overview document rather than actionable instruction.	1 / 3
Actionability	Almost no executable code or concrete commands. The Python code block is just a list of stage names as strings. The YAML example is a skeleton with no real content. Multiple code blocks contain only comments pointing to other files (e.g., '# See references/model-training.md'). There is nothing copy-paste ready or directly executable anywhere in the skill.	1 / 3
Workflow Clarity	The 'Production Workflow' section lists phases with bullet points but provides no concrete commands, no validation checkpoints with specific tools or scripts, and no feedback loops for error recovery. The troubleshooting section is entirely generic ('check dependencies and data availability'). For a skill involving complex multi-step pipeline operations, there are no explicit validation gates or retry mechanisms demonstrated.	1 / 3
Progressive Disclosure	The skill references external files in references/ and assets/ directories with clear descriptions, which is good structure. However, no bundle files are provided, so these references lead nowhere. The main document itself is a monolithic wall of high-level descriptions that should have been split into the referenced files, with the SKILL.md being a lean overview. The 'Progressive Disclosure' section ironically describes the concept rather than implementing it.	2 / 3
	Total	5 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: Dicklesworthstone/pi_agent_rust
Commit: b09ec7f

Reviewed: about 7 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.