senior-ml-engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.

1.57x

Quality

78%

Does it follow best practices?

Impact

93%

1.57x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.md

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that hits all the marks. It provides specific concrete capabilities, includes a comprehensive 'Use when...' clause with natural trigger terms and tool names, and explicitly defines its scope boundary to avoid conflicts with related ML skills. The description is well-structured, concise, and informative without being padded.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'productionizing models', 'building MLOps pipelines', 'integrating LLMs', 'model deployment', 'feature stores', 'drift monitoring', 'RAG systems', 'cost optimization'. These are concrete, well-defined capabilities.	3 / 3
Completeness	Clearly answers both 'what' (productionizing models, MLOps pipelines, LLM integration, feature stores, drift monitoring, RAG, cost optimization) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios. Also includes a helpful scope boundary: 'Focused on production and operational concerns rather than model research or initial training.'	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'deploying ML models to production', 'MLOps', 'MLflow', 'Kubeflow', 'Kubernetes', 'Docker', 'model performance', 'drift', 'RAG pipelines', 'LLM APIs', 'retry logic', 'cost controls'. These are terms practitioners naturally use.	3 / 3
Distinctiveness Conflict Risk	Clearly carves out a distinct niche by explicitly scoping to production/operational ML concerns and distinguishing itself from model research or training skills. The specific mention of MLOps tools, deployment, drift monitoring, and the explicit boundary statement make it highly distinguishable from general ML or data science skills.	3 / 3
	Total	12 / 12 Passed

Implementation

57%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a broadly structured ML engineering skill that covers many production concerns with decent organization and progressive disclosure. Its main weaknesses are that it leans toward reference/comparison material rather than deeply actionable guidance, and its workflows lack explicit error recovery and rollback steps. The code examples are a mix of useful snippets and incomplete abstractions, and some content (pricing tables, tech stack) will age poorly or adds little value for Claude.

Suggestions

Replace the LLM cost management table with a note to check current pricing, as these values change frequently and will mislead Claude.

Add explicit error recovery and rollback steps to the deployment workflow (e.g., 'If canary metrics fail: rollback with `kubectl rollout undo`, investigate logs, fix, and redeploy').

Complete the provider abstraction with at least one concrete implementation (e.g., an OpenAI provider) so the pattern is copy-paste executable rather than abstract.

Remove or condense the Tech Stack table — Claude already knows these tools and the table adds no actionable guidance.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some unnecessary content like the Tech Stack table (Claude knows these tools), the Cost Management table with specific pricing that will quickly become outdated, and comparison tables that are more reference material than actionable guidance. The serving options and vector database comparison tables add bulk without being directly actionable.	2 / 3
Actionability	Provides some executable code (Dockerfile, Feast config, drift detection, retry logic) but much of the content is high-level workflow steps and comparison tables rather than concrete, copy-paste-ready implementations. The provider abstraction is incomplete (abstract class with no concrete implementation), and the CLI tool commands reference scripts that don't exist without further context.	2 / 3
Workflow Clarity	Each section has numbered steps with a validation checkpoint at the end, which is good. However, the validation steps are stated as final checks rather than integrated feedback loops — there's no 'if validation fails, do X' recovery path except in the deployment workflow's canary step. For destructive/batch operations like model deployment, the lack of explicit rollback procedures and mid-workflow validation gates is a gap.	2 / 3
Progressive Disclosure	The skill has a clear table of contents, well-organized sections with concise overviews, and appropriately delegates detailed content to reference files (references/mlops_production_patterns.md, references/llm_integration_guide.md, references/rag_system_architecture.md) with clear descriptions of what each contains. References are one level deep and well-signaled.	3 / 3
	Total	9 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: alirezarezvani/claude-skills
Commit: f567c61

Reviewed: 10 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.