CtrlK
BlogDocsLog inGet started
Tessl Logo

ai-engineer

Trains and fine-tunes ML models, builds data preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference into production applications, and designs RAG and LLM-powered systems. Covers MLOps workflows including experiment tracking, drift detection, retraining triggers, and A/B testing. Use when the user asks about training or fine-tuning a model, building ML pipelines, model serving or inference optimization, evaluating model performance, working with frameworks like PyTorch, TensorFlow, scikit-learn, or Hugging Face, setting up vector databases, prompt engineering, or taking an ML prototype to production.

88

1.09x
Quality

88%

Does it follow best practices?

Impact

81%

1.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Evaluation results

62%

3%

Churn Prediction Model Comparison

MLflow experiment tracking and model evaluation summary

Criteria
Without context
With context

MLflow experiment name

0%

0%

MLflow run name

0%

0%

MLflow log_params

0%

0%

MLflow log_metrics

0%

0%

MLflow log_model with registered name

0%

0%

Evaluation summary task description

40%

100%

Evaluation summary dataset info

100%

100%

Evaluation summary baseline vs candidate

100%

100%

Latency in evaluation summary

100%

100%

Slice evaluation present

100%

100%

Failure mode identified

100%

100%

Deployment recommendation

100%

100%

Stratified split used

100%

100%

No fabricated numbers in prose

100%

100%

100%

22%

Deploy a Trained Model as a REST API

FastAPI model serving endpoint with MLflow model registry

Criteria
Without context
With context

FastAPI used

100%

100%

Pydantic request schema

100%

100%

Pydantic response schema

100%

100%

response_model used

100%

100%

POST /predict endpoint

100%

100%

Health check endpoint

100%

100%

MLflow model registry URI

100%

100%

mlflow.sklearn.load_model used

0%

100%

numpy array reshape

100%

100%

Score extraction

0%

100%

82%

-3%

Prepare a Model for Production Promotion

Pre-deployment checklist and post-launch monitoring thresholds

Criteria
Without context
With context

Checklist: held-out metrics

100%

66%

Checklist: slice evaluation

100%

100%

Checklist: artifact versioning

100%

100%

Checklist: smoke test

100%

100%

Checklist: latency benchmark

66%

50%

Checklist: rollback artifact

100%

100%

Checklist: monitoring dashboards

100%

100%

Checklist: rollback runbook

100%

100%

Checklist: privacy controls

100%

100%

Drift metric: PSI threshold

100%

58%

Retraining trigger: performance drop

30%

70%

Retraining trigger: data volume

100%

50%

Rollback trigger: error rate

40%

100%

Rollback trigger: latency duration

100%

100%

Repository
OpenRoster-ai/awesome-agents
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.