senior-ml-engineer

tessl i github:alirezarezvani/claude-skills --skill senior-ml-engineer

World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems. Expertise in PyTorch, TensorFlow, model deployment, feature stores, model monitoring, and ML infrastructure. Includes LLM integration, fine-tuning, RAG systems, and agentic AI. Use when deploying ML models, building ML platforms, implementing MLOps, or integrating LLMs into production systems.

52%

Overall

Validation — 88%

Implementation — 7%

Activation — 92%

SKILL.md

Review

Evals

Validation

88%

Warnings & errors only

Criteria	Description	Result
metadata_version	'metadata' field is not a dictionary	Warning
license_field	'license' field is missing	Warning

	Total	14 / 16 Passed

Implementation

This skill is a verbose collection of buzzwords and generic best practices rather than actionable ML engineering guidance. It lists technologies and concepts without providing concrete implementation details, executable code, or clear workflows. The content describes what a senior ML engineer should know rather than teaching Claude how to perform specific ML engineering tasks.

Suggestions

Replace fake script references with actual executable code examples for specific tasks (e.g., a real model deployment script, actual RAG implementation code)

Remove generic sections like 'Senior-Level Responsibilities', 'Best Practices', and 'Tech Stack' lists - Claude already knows these concepts

Add concrete step-by-step workflows with validation checkpoints for key tasks like 'deploying a PyTorch model to Kubernetes' or 'building a RAG pipeline'

Provide specific, copy-paste-ready code snippets for common ML operations instead of abstract pattern descriptions

Dimension	Reasoning	Score
Conciseness	Extremely verbose with extensive lists of concepts Claude already knows (what TDD is, what code reviews are, generic best practices). The 'Senior-Level Responsibilities' section is entirely unnecessary padding about soft skills. Tech stack lists and generic performance targets add no actionable value.	1 / 3
Actionability	Despite showing bash commands, they reference non-existent scripts (model_deployment_pipeline.py, rag_system_builder.py). No actual executable code is provided - just abstract descriptions like 'Horizontal scaling architecture' and 'Fault-tolerant design' without concrete implementation guidance.	1 / 3
Workflow Clarity	No clear workflows for any ML task. Lists concepts like 'Model serving with low latency' and 'A/B testing infrastructure' without explaining how to actually implement them. No validation checkpoints, no step-by-step processes, no error handling guidance for complex ML operations.	1 / 3
Progressive Disclosure	References external files (references/mlops_production_patterns.md, etc.) which is good structure, but the main file itself is bloated with content that should either be in those references or removed entirely. The overview doesn't provide enough actionable quick-start content.	2 / 3
	Total	5 / 12 Passed

Activation

92%

This is a strong, well-crafted description that clearly articulates capabilities with specific technologies and includes explicit trigger guidance. The main weakness is its broad scope spanning traditional MLOps through to agentic AI, which could create selection conflicts with more specialized skills in either domain.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions and technologies: 'productionizing ML models', 'model deployment', 'feature stores', 'model monitoring', 'ML infrastructure', 'LLM integration', 'fine-tuning', 'RAG systems', and 'agentic AI'.	3 / 3
Completeness	Clearly answers both what (productionizing ML models, MLOps, scalable ML systems, various technologies) AND when with explicit 'Use when...' clause covering deploying ML models, building ML platforms, implementing MLOps, or integrating LLMs.	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'PyTorch', 'TensorFlow', 'MLOps', 'ML models', 'feature stores', 'LLM', 'fine-tuning', 'RAG systems', 'agentic AI', 'ML platforms' - these are all terms practitioners naturally use.	3 / 3
Distinctiveness Conflict Risk	While specific to ML engineering, the broad scope covering both traditional ML and LLM/agentic AI could overlap with separate LLM-specific skills or general Python/data science skills. The combination of MLOps AND LLM integration in one skill creates potential conflict with more focused skills.	2 / 3
	Total	11 / 12 Passed

Reviewed

18 days ago

Table of Contents

Validation Implementation Activation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.