CtrlK
BlogDocsLog inGet started
Tessl Logo

senior-ml-engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.

87

1.57x
Quality

78%

Does it follow best practices?

Impact

93%

1.57x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the key criteria. It provides specific concrete capabilities, includes a comprehensive 'Use when...' clause with natural trigger terms and specific tool names, and clearly delineates its scope boundary to avoid conflicts with related skills. The description is well-structured, uses third person voice throughout, and balances comprehensiveness with clarity.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: productionizing models, building MLOps pipelines, integrating LLMs, model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. These are clearly defined capabilities.

3 / 3

Completeness

Clearly answers both 'what' (productionizing models, MLOps pipelines, LLM integration, feature stores, drift monitoring, RAG, cost optimization) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios. Also includes a helpful scope boundary ('Focused on production and operational concerns rather than model research or initial training').

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'deploying ML models to production', 'MLOps', 'MLflow', 'Kubeflow', 'Kubernetes', 'Docker', 'model drift', 'RAG pipelines', 'LLM APIs', 'retry logic', 'cost controls'. These are terms practitioners naturally use.

3 / 3

Distinctiveness Conflict Risk

Clearly carves out a distinct niche by focusing on production/operational ML concerns and explicitly distinguishing itself from model research or initial training. The specific tool mentions (MLflow, Kubeflow) and focus areas (drift monitoring, cost optimization, RAG pipelines) make it unlikely to conflict with general ML or data science skills.

3 / 3

Total

12

/

12

Passed

Implementation

57%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-organized skill covering a broad ML engineering domain with good progressive disclosure and clear section structure. However, it leans toward being a reference catalog (comparison tables, tech stacks) rather than providing deeply actionable, executable guidance. The workflows list steps but lack error recovery feedback loops and concrete validation implementations, and some content (pricing tables, tech stack lists) will become stale or is unnecessary for Claude.

Suggestions

Add explicit error recovery/feedback loops to workflows (e.g., 'If canary metrics fail: rollback with `kubectl rollout undo`, investigate logs, fix, and redeploy')

Replace the LLM cost management table (which will become outdated) with a pattern for programmatic cost tracking, and remove the Tech Stack table which adds no actionable value

Provide a complete, executable implementation for at least one workflow (e.g., a full FastAPI model serving endpoint with health check and drift monitoring) rather than fragments and abstract classes

DimensionReasoningScore

Conciseness

The skill is reasonably efficient but includes some unnecessary content like the Tech Stack table (Claude knows these tools), the Cost Management table with specific pricing that will quickly become outdated, and comparison tables that are more reference material than actionable guidance. The serving options and vector database comparison tables add bulk without being directly actionable.

2 / 3

Actionability

Provides some executable code (Dockerfile, Feast config, drift detection, retry logic) but much of the content is high-level workflow steps and comparison tables rather than concrete, copy-paste-ready implementations. The provider abstraction is incomplete (abstract class with no concrete implementation), and the tool scripts reference files that may not exist. Many steps are descriptive rather than instructive.

2 / 3

Workflow Clarity

Each section has numbered steps with a validation checkpoint at the end, which is good. However, the validation steps are stated as goals rather than executable checks (e.g., 'Response references retrieved context, no hallucinations' - how?). There are no feedback loops for error recovery - if canary deployment fails, if drift is detected mid-pipeline, or if validation fails, there's no explicit 'fix and retry' guidance. For destructive/batch operations like model deployment, this caps the score at 2.

2 / 3

Progressive Disclosure

Well-structured with a clear table of contents, concise overview sections in the main file, and explicit one-level-deep references to detailed documentation (references/mlops_production_patterns.md, references/llm_integration_guide.md, references/rag_system_architecture.md). Each reference file's contents are clearly described, making navigation easy.

3 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
alirezarezvani/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.