Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.
58
43%
Does it follow best practices?
Impact
73%
0.98xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./tests/ext_conformance/artifacts/agents-wshobson/machine-learning-ops/skills/ml-pipeline-workflow/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is structurally sound with a clear 'what' and explicit 'when' clause, which is its strongest aspect. However, it operates at a high level of abstraction—listing pipeline phases rather than concrete actions—and could benefit from more specific trigger terms and concrete capabilities to better differentiate it from adjacent skills in data science, DevOps, or data engineering.
Suggestions
Add more specific concrete actions such as 'configure experiment tracking, set up model registries, implement A/B testing, create feature stores, build CI/CD for models'.
Expand trigger terms in the 'Use when' clause to include natural user phrases like 'machine learning', 'model registry', 'experiment tracking', 'model serving', 'ML CI/CD', or specific tools like 'MLflow', 'Kubeflow'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (MLOps) and lists high-level stages (data preparation, model training, validation, deployment), but these are broad phases rather than multiple specific concrete actions like 'fill forms, merge documents'. It doesn't specify concrete tools, formats, or granular operations. | 2 / 3 |
Completeness | Clearly answers both 'what' (build end-to-end MLOps pipelines from data preparation through deployment) and 'when' with an explicit 'Use when...' clause covering ML pipelines, MLOps practices, and automating training/deployment workflows. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'ML pipelines', 'MLOps', 'model training', 'deployment workflows', but misses common user variations such as 'machine learning', 'CI/CD for models', 'model serving', 'experiment tracking', 'feature engineering', or specific framework names users might mention. | 2 / 3 |
Distinctiveness Conflict Risk | The MLOps focus provides some distinctiveness, but terms like 'data preparation', 'model training', and 'deployment' could overlap with general data science skills, deployment/DevOps skills, or data engineering skills. The scope is broad enough to potentially conflict with more specialized skills. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
20%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads like a high-level overview document or course outline rather than an actionable skill file. It extensively catalogs MLOps concepts, tools, and patterns that Claude already knows, while providing almost no executable code, specific commands, or concrete implementation details. The content would benefit enormously from being condensed to a lean overview with actual working examples and deferring detailed content to the referenced files.
Suggestions
Replace the abstract descriptions with executable, copy-paste-ready code examples — e.g., a minimal working Airflow DAG, a concrete MLflow experiment tracking snippet, or a real validation script with Great Expectations.
Cut sections that merely list concepts Claude already knows (Integration Points, Deployment Strategies bullet lists, 'When to Use This Skill') and replace with a concise overview pointing to reference files.
Add explicit validation checkpoints with concrete commands in the workflow (e.g., 'Run `great_expectations checkpoint run my_checkpoint` — only proceed if all expectations pass').
Move the bulk of the content into the referenced files (data-preparation.md, model-training.md, etc.) and keep SKILL.md as a lean entry point with one concrete quick-start example and clear navigation.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and padded with high-level descriptions Claude already knows. Sections like 'When to Use This Skill', 'What This Skill Provides', 'Integration Points', and 'Common Patterns' are largely enumerations of concepts (canary deployments, A/B testing, DAG orchestration) without adding actionable knowledge. The document reads like a table of contents or course syllabus rather than a lean skill file. | 1 / 3 |
Actionability | Almost no executable code or concrete commands. The Python snippets are either trivial lists of strings, comments pointing to other files ('See assets/...'), or empty pseudocode placeholders. There are no copy-paste-ready examples of actually building a pipeline step, configuring an orchestrator, or deploying a model. The 'Real-time Feature Pipeline' and 'Continuous Training' sections are literally just comments. | 1 / 3 |
Workflow Clarity | The Production Workflow section provides a clear four-phase sequence with sub-steps, and the Debugging Steps section offers a reasonable troubleshooting sequence. However, there are no explicit validation checkpoints with concrete commands, no feedback loops for error recovery, and the validation phase is described abstractly ('Run validation test suite') rather than with specific tools or commands. | 2 / 3 |
Progressive Disclosure | References to external files (references/ directory, assets/ directory) are clearly signaled and appear to be one level deep, which is good. However, the main SKILL.md itself is a monolithic wall of text with extensive inline content that should be in those reference files. The 'Progressive Disclosure' section ironically describes levels of complexity rather than actually implementing progressive disclosure in the document structure. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
6e3d68c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.