ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.
87
78%
Does it follow best practices?
Impact
93%
1.57xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the key criteria. It provides specific concrete capabilities, includes a comprehensive 'Use when...' clause with natural trigger terms and specific tool names, and clearly delineates its scope boundary to avoid conflicts with related skills. The description is well-structured, uses third person voice throughout, and balances comprehensiveness with clarity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: productionizing models, building MLOps pipelines, integrating LLMs, model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. These are clearly defined capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (productionizing models, MLOps pipelines, LLM integration, feature stores, drift monitoring, RAG, cost optimization) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios. Also includes a helpful scope boundary ('Focused on production and operational concerns rather than model research or initial training'). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'deploying ML models to production', 'MLOps', 'MLflow', 'Kubeflow', 'Kubernetes', 'Docker', 'model drift', 'RAG pipelines', 'LLM APIs', 'retry logic', 'cost controls'. These are terms practitioners naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly carves out a distinct niche by focusing on production/operational ML concerns and explicitly distinguishing itself from model research or initial training. The specific tool mentions (MLflow, Kubeflow) and focus areas (drift monitoring, cost optimization, RAG pipelines) make it unlikely to conflict with general ML or data science skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-organized skill covering a broad ML engineering domain with good progressive disclosure and clear section structure. However, it leans toward being a reference catalog (comparison tables, tech stacks) rather than providing deeply actionable, executable guidance. The workflows list steps but lack error recovery feedback loops and concrete validation implementations, and some content (pricing tables, tech stack lists) will become stale or is unnecessary for Claude.
Suggestions
Add explicit error recovery/feedback loops to workflows (e.g., 'If canary metrics fail: rollback with `kubectl rollout undo`, investigate logs, fix, and redeploy')
Replace the LLM cost management table (which will become outdated) with a pattern for programmatic cost tracking, and remove the Tech Stack table which adds no actionable value
Provide a complete, executable implementation for at least one workflow (e.g., a full FastAPI model serving endpoint with health check and drift monitoring) rather than fragments and abstract classes
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably efficient but includes some unnecessary content like the Tech Stack table (Claude knows these tools), the Cost Management table with specific pricing that will quickly become outdated, and comparison tables that are more reference material than actionable guidance. The serving options and vector database comparison tables add bulk without being directly actionable. | 2 / 3 |
Actionability | Provides some executable code (Dockerfile, Feast config, drift detection, retry logic) but much of the content is high-level workflow steps and comparison tables rather than concrete, copy-paste-ready implementations. The provider abstraction is incomplete (abstract class with no concrete implementation), and the tool scripts reference files that may not exist. Many steps are descriptive rather than instructive. | 2 / 3 |
Workflow Clarity | Each section has numbered steps with a validation checkpoint at the end, which is good. However, the validation steps are stated as goals rather than executable checks (e.g., 'Response references retrieved context, no hallucinations' - how?). There are no feedback loops for error recovery - if canary deployment fails, if drift is detected mid-pipeline, or if validation fails, there's no explicit 'fix and retry' guidance. For destructive/batch operations like model deployment, this caps the score at 2. | 2 / 3 |
Progressive Disclosure | Well-structured with a clear table of contents, concise overview sections in the main file, and explicit one-level-deep references to detailed documentation (references/mlops_production_patterns.md, references/llm_integration_guide.md, references/rag_system_architecture.md). Each reference file's contents are clearly described, making navigation easy. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
a96cc20
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.