CtrlK
BlogDocsLog inGet started
Tessl Logo

ml-pipeline-workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

65

0.98x
Quality

Does it follow best practices?

Impact

73%

0.98x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is well-sectioned and clearly sequenced, but it is verbose with generic knowledge Claude already has, its code examples are non-executable stubs, and its promised reference/asset files do not actually exist.

Suggestions

Cut generic best-practice and tool-list prose (Great Expectations/TFX, MLflow, canary/blue-green) that Claude already knows, and keep only pipeline-specific, non-obvious guidance to improve conciseness.

Replace stub code blocks (comment-only python and '# See assets/...' placeholders) with complete, executable orchestration examples or remove them.

Either create the referenced references/ and assets/ files or remove the dangling pointers, and move the inline capability/best-practice detail into those files so SKILL.md stays a lean overview.

Add explicit validation/rollback checkpoints (e.g., validate artifacts before deploy, auto-rollback on metric regression) into the Production Workflow sequence.

DimensionReasoningScore

Conciseness

The ~240-line body is padded with generic best-practice and tool lists Claude already knows ("Use data validation libraries (Great Expectations, TFX)", MLflow, canary/blue-green/rollback), matching the score-1 anchor 'verbose; explains concepts Claude knows; padded' rather than the mostly-efficient score-2.

1 / 3

Actionability

Code blocks are stubs ("# See assets/pipeline-dag.yaml.template", comment-only python), but there is some concrete structure (a stages array and a yaml dependency graph), fitting score-2 'some concrete guidance but incomplete; pseudocode instead of executable code' rather than the fully copy-paste-ready score-3.

2 / 3

Workflow Clarity

The 4-phase Production Workflow is clearly sequenced, but it lacks explicit validate->fix->retry feedback loops or rollback checkpoints for these batch/deployment operations, which the rubric caps at score-2 rather than the score-3 'explicit validation steps; feedback loops'.

2 / 3

Progressive Disclosure

References are clearly signaled in dedicated sections (one level deep), but the referenced references/ and assets/ directories do not exist and the body is bloated with content that belongs in those files, fitting score-2 'references present but content that should be separate is inline' rather than the cleanly-split score-3.

2 / 3

Total

7

/

12

Passed

Description

77%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific and complete with an explicit 'Use when' trigger, but its trigger terms are narrow and its deployment framing risks overlap with related ML skills.

Suggestions

Broaden trigger terms to include natural variations users say, such as 'model serving', 'monitoring', 'retraining', or 'CI/CD for ML', to improve trigger-term coverage.

Sharpen distinctiveness by emphasizing the end-to-end orchestration scope (DAG-based pipeline coordination) over generic 'training and deployment' phrasing that overlaps with sibling skills.

DimensionReasoningScore

Specificity

"Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment" lists multiple specific concrete actions (data prep, training, validation, deployment), matching the score-3 anchor rather than the score-2 'names domain and some actions'.

3 / 3

Completeness

It explicitly states what it does ("Build end-to-end MLOps pipelines...") and when to use it ("Use when creating ML pipelines..."), satisfying both halves with an explicit trigger clause, which clears the score-2 cap noted in the guidelines.

3 / 3

Trigger Term Quality

"creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows" offers relevant but narrow terms clustered around ML/MLOps/training/deployment, missing common variations a user might say (model serving, monitoring, retraining), so it lands at score-2 rather than the broad coverage of score-3.

2 / 3

Distinctiveness Conflict Risk

The MLOps-pipeline niche is clear, but "automating model training and deployment workflows" overlaps with sibling skills the body itself references (model-deployment-patterns), fitting score-2 'could still overlap with similar skills' rather than the conflict-free score-3.

2 / 3

Total

10

/

12

Passed

Validation

93%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation15 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

referenced_paths_exist

Referenced path issues: 4 missing

Warning

Total

15

/

16

Passed

Repository
Dicklesworthstone/pi_agent_rust
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.