senior-ml-engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.

1.57x

Quality

78%

Does it follow best practices?

Impact

93%

1.57x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.md

Evaluation results

84%

48%

Customer Support Bot Integration

LLM integration with retry, fallback, and cost tracking

Criteria

Without context

With context

Abstract provider class

100%

Concrete providers

100%

Tenacity retry decorator

100%

Retry parameters

100%

Fallback implementation

100%

tiktoken token counting

Cost calculation

100%

Correct pricing values

50%

100%

Pydantic response model

100%

Response validation used

100%

Response caching

Cost summary output

100%

92%

35%

Fraud Detection Model Production Release

Model deployment containerization and canary workflow

Criteria

Without context

With context

python:3.11-slim base image

100%

Health check endpoint

37%

100%

Uvicorn CMD

62%

100%

Port 8080 exposed

100%

Model export step

Canary at 5%

100%

1 hour canary window

100%

p95 latency threshold

100%

Error rate threshold

100%

K8s memory limits

100%

K8s CPU limits

100%

K8s readiness probe

100%

84%

Internal Knowledge Base Search System

RAG pipeline with hybrid search, reranking, and chunking

Criteria

Without context

With context

RecursiveCharacterTextSplitter

Chunking separators

50%

Embedding cache using hash

100%

Batch embedding support

100%

BM25 sparse retrieval

100%

80%

Hybrid score combination

50%

100%

Alpha parameter

100%

Reranking step

90%

100%

Reranker sorts results

100%

Query function integration

100%

Demo runs without errors

100%

68%

Production Model Monitoring for Credit Scoring

Drift detection and alert thresholds

Criteria

Without context

With context

KS test for drift

100%

KS output fields

50%

100%

KS drift flag threshold

100%

p95 latency warning

100%

p95 latency critical

100%

Error rate warning

100%

Error rate critical

100%

PSI warning threshold

100%

PSI critical threshold

100%

Accuracy drop thresholds

100%

Retraining at PSI>0.2

100%

Demo output

100%

40%

Setting Up a MLOps Training Pipeline for a Churn Model

MLflow model registry and feature store

Criteria

Without context

With context

Feast Entity definition

100%

Feast FeatureView

100%

FileSource in FeatureView

100%

TTL set on FeatureView

100%

mlflow.start_run used

100%

mlflow.log_metric called

100%

Model logged to MLflow

100%

mlflow.register_model called

40%

100%

PSI drift trigger

80%

100%

Scheduled retrain trigger

100%

Performance drop trigger

100%

Pipeline runs without error

100%

Model Comparison Framework for a Search Ranking Experiment

A/B testing with deterministic traffic splitting

Criteria

Without context

With context

Hash-based assignment

100%

Experiment key in hash input

100%

Bucket conversion

100%

Sticky assignment

100%

Configurable control_pct

100%

Returns control/treatment

100%

Primary metric collection

100%

Guardrail metric tracking

100%

p-value < 0.05 threshold

100%

Minimum sample size check

70%

100%

Demo runs without errors

100%

Repository: alirezarezvani/claude-skills
Commit: f567c61

Evaluated: about 2 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Customer Support Bot Integration Fraud Detection Model Production Release Internal Knowledge Base Search System Production Model Monitoring for Credit Scoring Setting Up a MLOps Training Pipeline for a Churn Model Model Comparison Framework for a Search Ranking Experiment

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.