CtrlK
BlogDocsLog inGet started
Tessl Logo

senior-ml-engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.

87

1.57x
Quality

78%

Does it follow best practices?

Impact

93%

1.57x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.md
SKILL.md
Quality
Evals
Security

Evaluation results

84%

48%

Customer Support Bot Integration

LLM integration with retry, fallback, and cost tracking

Criteria
Without context
With context

Abstract provider class

100%

100%

Concrete providers

100%

100%

Tenacity retry decorator

0%

100%

Retry parameters

0%

100%

Fallback implementation

0%

100%

tiktoken token counting

0%

0%

Cost calculation

100%

100%

Correct pricing values

50%

100%

Pydantic response model

0%

100%

Response validation used

0%

100%

Response caching

0%

0%

Cost summary output

100%

100%

92%

35%

Fraud Detection Model Production Release

Model deployment containerization and canary workflow

Criteria
Without context
With context

python:3.11-slim base image

100%

100%

Health check endpoint

37%

100%

Uvicorn CMD

62%

100%

Port 8080 exposed

0%

100%

Model export step

0%

0%

Canary at 5%

100%

100%

1 hour canary window

0%

100%

p95 latency threshold

0%

100%

Error rate threshold

100%

100%

K8s memory limits

100%

100%

K8s CPU limits

100%

100%

K8s readiness probe

100%

100%

84%

8%

Internal Knowledge Base Search System

RAG pipeline with hybrid search, reranking, and chunking

Criteria
Without context
With context

RecursiveCharacterTextSplitter

0%

0%

Chunking separators

0%

50%

Embedding cache using hash

100%

100%

Batch embedding support

100%

100%

BM25 sparse retrieval

100%

80%

Hybrid score combination

50%

100%

Alpha parameter

100%

100%

Reranking step

90%

100%

Reranker sorts results

100%

100%

Query function integration

100%

100%

Demo runs without errors

100%

100%

100%

68%

Production Model Monitoring for Credit Scoring

Drift detection and alert thresholds

Criteria
Without context
With context

KS test for drift

100%

100%

KS output fields

50%

100%

KS drift flag threshold

100%

100%

p95 latency warning

0%

100%

p95 latency critical

0%

100%

Error rate warning

0%

100%

Error rate critical

0%

100%

PSI warning threshold

0%

100%

PSI critical threshold

0%

100%

Accuracy drop thresholds

0%

100%

Retraining at PSI>0.2

0%

100%

Demo output

100%

100%

100%

40%

Setting Up a MLOps Training Pipeline for a Churn Model

MLflow model registry and feature store

Criteria
Without context
With context

Feast Entity definition

0%

100%

Feast FeatureView

0%

100%

FileSource in FeatureView

0%

100%

TTL set on FeatureView

0%

100%

mlflow.start_run used

100%

100%

mlflow.log_metric called

100%

100%

Model logged to MLflow

100%

100%

mlflow.register_model called

40%

100%

PSI drift trigger

80%

100%

Scheduled retrain trigger

100%

100%

Performance drop trigger

100%

100%

Pipeline runs without error

100%

100%

100%

3%

Model Comparison Framework for a Search Ranking Experiment

A/B testing with deterministic traffic splitting

Criteria
Without context
With context

Hash-based assignment

100%

100%

Experiment key in hash input

100%

100%

Bucket conversion

100%

100%

Sticky assignment

100%

100%

Configurable control_pct

100%

100%

Returns control/treatment

100%

100%

Primary metric collection

100%

100%

Guardrail metric tracking

100%

100%

p-value < 0.05 threshold

100%

100%

Minimum sample size check

70%

100%

Demo runs without errors

100%

100%

Repository
alirezarezvani/claude-skills
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.