Designs and implements production-grade ML pipeline infrastructure: configures experiment tracking with MLflow or Weights & Biases, creates Kubeflow or Airflow DAGs for training orchestration, builds feature store schemas with Feast, deploys model registries, and automates retraining and validation workflows. Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, managing experiment tracking systems, setting up DVC for data versioning, tuning hyperparameters, or configuring MLOps tooling like Kubeflow, Airflow, MLflow, or Prefect.
90
92%
Does it follow best practices?
Impact
87%
1.12xAverage score across 6 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates specific capabilities with named tools and concrete actions, provides comprehensive trigger terms covering both tool names and task descriptions, and includes an explicit 'Use when...' clause. The description is well-scoped to a distinct MLOps/pipeline infrastructure niche, making it highly distinguishable from adjacent skills like general ML modeling or data analysis.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: configures experiment tracking with MLflow/W&B, creates Kubeflow/Airflow DAGs, builds feature store schemas with Feast, deploys model registries, and automates retraining/validation workflows. | 3 / 3 |
Completeness | Clearly answers both 'what' (designs and implements ML pipeline infrastructure with specific tools and actions) and 'when' (explicit 'Use when...' clause listing multiple trigger scenarios like building ML pipelines, orchestrating training workflows, etc.). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: ML pipelines, training workflows, model lifecycle, feature stores, experiment tracking, DVC, data versioning, hyperparameters, MLOps, Kubeflow, Airflow, MLflow, Prefect, Weights & Biases, Feast. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche focused on MLOps infrastructure and pipeline orchestration with specific tool names (MLflow, Kubeflow, Feast, Airflow, Prefect, DVC). Unlikely to conflict with general coding or data science skills due to the specificity of the domain. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides actionable code templates, clear workflow sequencing with validation checkpoints, and excellent progressive disclosure through a well-organized reference table. Minor weaknesses include a somewhat unnecessary 'Knowledge Reference' keyword dump at the end and a few constraint items that state obvious best practices, but overall the content is highly effective and production-ready.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient but includes some unnecessary elements — the 'Knowledge Reference' section at the end is just a keyword list that adds no actionable value, and some constraint items restate obvious best practices. The code templates are well-sized but could be slightly tighter. | 2 / 3 |
Actionability | The skill provides three fully executable code templates (MLflow logging, Kubeflow pipeline component, Great Expectations validation) that are copy-paste ready with concrete imports, parameters, and realistic patterns. The constraints and output format sections give specific, actionable guidance. | 3 / 3 |
Workflow Clarity | The core workflow is clearly sequenced with six numbered steps including explicit validation checkpoints (step 2: schema checks with halt-on-failure, step 6: evaluation gates before promotion). The data validation template includes a raise-on-failure pattern, and the constraints reinforce validation-before-training as mandatory. | 3 / 3 |
Progressive Disclosure | Excellent use of a reference table with five clearly signaled one-level-deep references, each with explicit 'Load When' conditions. The main skill provides a concise overview with templates while deferring detailed guidance to topic-specific reference files. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
3d95bb1
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.