CtrlK
BlogDocsLog inGet started
Tessl Logo

ml-pipeline-workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

55

Quality

43%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/machine-learning-ops/skills/ml-pipeline-workflow/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description has a solid structure with an explicit 'Use when' clause and covers the general scope of MLOps pipelines. However, it stays at a relatively high level of abstraction, listing pipeline stages rather than specific concrete actions, and the trigger terms could be expanded to cover more natural user language variations. It would benefit from more specific capabilities and additional keywords to improve both specificity and distinctiveness.

Suggestions

Add more specific concrete actions such as 'configure experiment tracking, set up model registries, implement feature stores, create CI/CD pipelines for models, automate model retraining'.

Expand trigger terms in the 'Use when' clause to include natural variations like 'machine learning workflow', 'model serving', 'experiment tracking', 'ML CI/CD', 'model monitoring', or specific framework names.

DimensionReasoningScore

Specificity

Names the domain (MLOps) and lists some actions ('data preparation', 'model training', 'validation', 'production deployment'), but these are high-level pipeline stages rather than multiple specific concrete actions like 'create feature stores, configure hyperparameter tuning, set up model registries, implement A/B testing'.

2 / 3

Completeness

Clearly answers both 'what' (build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment) and 'when' (explicit 'Use when' clause covering creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows).

3 / 3

Trigger Term Quality

Includes relevant keywords like 'ML pipelines', 'MLOps', 'model training', 'deployment workflows', but misses common natural variations users might say such as 'machine learning', 'CI/CD for models', 'model serving', 'experiment tracking', 'feature engineering', 'model registry', or specific tools like 'Kubeflow', 'MLflow'.

2 / 3

Distinctiveness Conflict Risk

The MLOps focus provides some distinctiveness, but terms like 'model training' and 'data preparation' could overlap with general data science or machine learning skills. The description doesn't clearly delineate boundaries against adjacent skills like a general ML skill or a deployment/DevOps skill.

2 / 3

Total

9

/

12

Passed

Implementation

20%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a high-level MLOps textbook table of contents than an actionable skill file. It extensively lists concepts, tools, and patterns that Claude already knows without providing concrete, executable guidance. The content is heavily padded with descriptions and enumerations while lacking the specific code, commands, and validation steps that would make it useful in practice.

Suggestions

Replace abstract descriptions with concrete, executable code examples - e.g., provide a complete minimal Airflow DAG or Dagster pipeline definition rather than a list of stage name strings.

Remove sections that enumerate well-known concepts (e.g., listing orchestration tools, deployment platforms, experiment tracking tools) - Claude already knows these. Focus on project-specific conventions and decisions.

Add explicit validation checkpoints with concrete commands/code at each pipeline stage boundary, including what to check and how to handle failures (e.g., 'Run `great_expectations checkpoint run data_quality` - if it fails, inspect the validation report at X and fix Y').

Cut the content by at least 50% - remove 'When to Use This Skill', 'What This Skill Provides' enumerations, 'Integration Points' tool listings, and the 'Progressive Disclosure' section that describes levels without content. Move any necessary detail to the referenced files.

DimensionReasoningScore

Conciseness

Extremely verbose and padded with high-level descriptions Claude already knows. Sections like 'When to Use This Skill', 'What This Skill Provides', 'Integration Points', and 'Common Patterns' are largely enumerations of well-known concepts (canary deployments, A/B testing, DAG orchestration) without adding novel, specific guidance. The 'Progressive Disclosure' section ironically describes levels of complexity without providing any actual content for them.

1 / 3

Actionability

Almost no executable code or concrete commands. The Python code examples are either trivial (a list of stage name strings) or are just comments pointing to other files ('See references/...', 'See assets/...'). The YAML example is a skeleton with no real configuration. There is nothing copy-paste ready or directly usable.

1 / 3

Workflow Clarity

The Production Workflow section provides a clear four-phase sequence with sub-steps, which is reasonable. However, there are no explicit validation checkpoints with commands, no feedback loops for error recovery, and the 'Validation Phase' is described abstractly ('Run validation test suite') rather than with concrete validation steps and what to do on failure.

2 / 3

Progressive Disclosure

References to external files (references/ and assets/ directories) are present and clearly signaled, which is good. However, the main file itself is a monolithic wall of text with extensive inline content that is mostly abstract descriptions rather than actionable overviews. Much of the content could be trimmed or moved to reference files.

2 / 3

Total

6

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
wshobson/agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.