Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.
53
37%
Does it follow best practices?
Impact
73%
0.98xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./tests/ext_conformance/artifacts/agents-wshobson/machine-learning-ops/skills/ml-pipeline-workflow/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is structurally sound with a clear 'what' and explicit 'when' clause, which is its strongest aspect. However, it operates at a high level of abstraction—listing pipeline phases rather than concrete actions—and could benefit from more specific trigger terms and concrete capabilities to better differentiate it from adjacent skills in data science, DevOps, or data engineering.
Suggestions
Add more specific concrete actions such as 'configure experiment tracking, set up model registries, implement A/B testing, create feature stores, build CI/CD for models'.
Expand trigger terms in the 'Use when' clause to include natural user phrases like 'machine learning', 'model registry', 'experiment tracking', 'model serving', 'ML CI/CD', or specific tools like 'MLflow', 'Kubeflow'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (MLOps) and lists high-level stages (data preparation, model training, validation, deployment), but these are broad phases rather than multiple specific concrete actions like 'fill forms, merge documents'. It doesn't specify concrete tools, formats, or granular operations. | 2 / 3 |
Completeness | Clearly answers both 'what' (build end-to-end MLOps pipelines from data preparation through deployment) and 'when' with an explicit 'Use when...' clause covering ML pipelines, MLOps practices, and automating training/deployment workflows. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'ML pipelines', 'MLOps', 'model training', 'deployment workflows', but misses common user variations such as 'machine learning', 'CI/CD for models', 'model serving', 'experiment tracking', 'feature engineering', or specific framework names users might mention. | 2 / 3 |
Distinctiveness Conflict Risk | The MLOps focus provides some distinctiveness, but terms like 'data preparation', 'model training', and 'deployment' could overlap with general data science skills, deployment/DevOps skills, or data engineering skills. The scope is broad enough to potentially conflict with more specialized skills. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
7%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a high-level outline or table of contents for MLOps concepts, not an actionable skill. It describes what an ML pipeline should contain rather than showing how to build one, with virtually no executable code, no concrete validation steps, and extensive padding with information Claude already knows. The referenced supporting files don't exist, leaving the skill as an empty framework.
Suggestions
Replace the descriptive lists with a single concrete, executable end-to-end pipeline example (e.g., a working Airflow DAG or Dagster pipeline with real code for each stage).
Add explicit validation checkpoints with concrete commands, such as data validation with Great Expectations code, model validation with specific metric thresholds, and pre-deployment checks.
Remove sections that describe concepts Claude already knows (tool listings, generic best practices, 'When to Use' section) and replace with specific, opinionated guidance unique to this skill.
Either provide the referenced bundle files (data-preparation.md, model-training.md, etc.) with real content, or inline the critical actionable content directly in the SKILL.md.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and padded with information Claude already knows. Lists of orchestration tools, deployment platforms, experiment tracking tools, and best practices like 'modularity' and 'idempotency' are all general knowledge. The 'When to Use This Skill' and 'What This Skill Provides' sections are meta-descriptions that waste tokens describing the skill rather than teaching it. The entire document reads like a table of contents or overview document rather than actionable instruction. | 1 / 3 |
Actionability | Almost no executable code or concrete commands. The Python code block is just a list of stage names as strings. The YAML example is a skeleton with no real content. Multiple code blocks contain only comments pointing to other files (e.g., '# See references/model-training.md'). There is nothing copy-paste ready or directly executable anywhere in the skill. | 1 / 3 |
Workflow Clarity | The 'Production Workflow' section lists phases with bullet points but provides no concrete commands, no validation checkpoints with specific tools or scripts, and no feedback loops for error recovery. The troubleshooting section is entirely generic ('check dependencies and data availability'). For a skill involving complex multi-step pipeline operations, there are no explicit validation gates or retry mechanisms demonstrated. | 1 / 3 |
Progressive Disclosure | The skill references external files in references/ and assets/ directories with clear descriptions, which is good structure. However, no bundle files are provided, so these references lead nowhere. The main document itself is a monolithic wall of high-level descriptions that should have been split into the referenced files, with the SKILL.md being a lean overview. The 'Progressive Disclosure' section ironically describes the concept rather than implementing it. | 2 / 3 |
Total | 5 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
b09ec7f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.