Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.
58
43%
Does it follow best practices?
Impact
73%
0.98xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./tests/ext_conformance/artifacts/agents-wshobson/machine-learning-ops/skills/ml-pipeline-workflow/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has a solid structure with both 'what' and 'when' clauses clearly present, which is its strongest aspect. However, the capabilities listed are high-level pipeline stages rather than specific concrete actions, and the trigger terms, while relevant, lack the breadth of natural language variations users might employ. It could benefit from more granular action descriptions and additional trigger keywords to better distinguish it from adjacent ML or DevOps skills.
Suggestions
Add more specific concrete actions beyond pipeline stages, e.g., 'configure experiment tracking, set up model registries, implement feature stores, automate retraining schedules'.
Expand trigger terms with natural variations users would say: 'machine learning workflow', 'model serving', 'CI/CD for ML', 'experiment tracking', 'model monitoring', 'retraining pipeline'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (MLOps) and lists some actions ('data preparation', 'model training', 'validation', 'production deployment'), but these are high-level pipeline stages rather than multiple specific concrete actions like 'create feature stores, configure hyperparameter tuning, set up model registries'. | 2 / 3 |
Completeness | Clearly answers both 'what' (build end-to-end MLOps pipelines from data preparation through deployment) and 'when' with an explicit 'Use when...' clause covering creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'ML pipelines', 'MLOps', 'model training', 'deployment workflows', but misses common natural variations users might say such as 'machine learning', 'CI/CD for models', 'model serving', 'experiment tracking', 'feature engineering', or specific tools like 'Kubeflow', 'MLflow'. | 2 / 3 |
Distinctiveness Conflict Risk | The MLOps focus provides some distinctiveness, but terms like 'model training' and 'deployment' could overlap with general ML skills, data science skills, or DevOps/deployment skills. More specific triggers (e.g., pipeline orchestration tools, model registry, experiment tracking) would reduce conflict risk. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
20%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a table of contents or a high-level overview document than actionable guidance. It extensively lists well-known MLOps concepts (canary deployments, A/B testing, model registries) without providing the concrete, executable instructions that would actually help Claude build a pipeline. The code examples are either trivial or just comments pointing elsewhere, making the skill largely ineffective despite its length.
Suggestions
Replace the placeholder code blocks with complete, executable examples — e.g., a working Airflow DAG definition, a concrete training script with experiment tracking, or a deployment script with validation checks.
Remove sections that enumerate concepts Claude already knows (e.g., lists of orchestration tools, deployment platforms, experiment tracking tools) and replace them with specific configuration patterns or decision criteria unique to this project/context.
Add explicit validation checkpoints with concrete commands in the workflow — e.g., 'Run `great_expectations checkpoint run data_quality` before proceeding to training' with expected output and error handling.
Consolidate the 'When to Use', 'What This Skill Provides', and 'Core Capabilities' sections into a 2-3 line overview, moving the bulk of content into actionable quick-start instructions.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and padded with high-level descriptions Claude already knows. Sections like 'When to Use This Skill', 'What This Skill Provides', 'Integration Points', and 'Common Patterns' are largely enumerations of well-known concepts (canary deployments, A/B testing, DAG orchestration) without adding novel, specific guidance. The document is ~180 lines but delivers very little actionable content. | 1 / 3 |
Actionability | Almost no executable code or concrete commands. The Python snippets are either trivial lists of strings, comments pointing to other files ('See assets/...'), or literal pseudocode comments ('# Stream processing for real-time features'). There is nothing copy-paste ready or directly usable. | 1 / 3 |
Workflow Clarity | The Production Workflow section provides a clear 4-phase sequence with sub-steps, which is reasonable. However, there are no explicit validation checkpoints with commands, no feedback loops for error recovery, and the troubleshooting section is generic. For a pipeline involving destructive/batch operations, the lack of concrete validation steps is a significant gap. | 2 / 3 |
Progressive Disclosure | References to external files (references/, assets/) are present and organized, which is good. However, the main document itself is a monolithic wall of high-level bullet points that could be significantly trimmed. The 'Progressive Disclosure' section ironically describes the concept rather than implementing it in the document structure. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
47823e3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.