ai-engineer

Trains and fine-tunes ML models, builds data preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference into production applications, and designs RAG and LLM-powered systems. Covers MLOps workflows including experiment tracking, drift detection, retraining triggers, and A/B testing. Use when the user asks about training or fine-tuning a model, building ML pipelines, model serving or inference optimization, evaluating model performance, working with frameworks like PyTorch, TensorFlow, scikit-learn, or Hugging Face, setting up vector databases, prompt engineering, or taking an ML prototype to production.

1.09x

Quality

88%

Does it follow best practices?

Impact

81%

1.09x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Evaluation results

62%

Churn Prediction Model Comparison

MLflow experiment tracking and model evaluation summary

Criteria

Without context

With context

MLflow experiment name

MLflow run name

MLflow log_params

MLflow log_metrics

MLflow log_model with registered name

Evaluation summary task description

40%

100%

Evaluation summary dataset info

100%

Evaluation summary baseline vs candidate

100%

Latency in evaluation summary

100%

Slice evaluation present

100%

Failure mode identified

100%

Deployment recommendation

100%

Stratified split used

100%

No fabricated numbers in prose

100%

22%

Deploy a Trained Model as a REST API

FastAPI model serving endpoint with MLflow model registry

Criteria

Without context

With context

FastAPI used

100%

Pydantic request schema

100%

Pydantic response schema

100%

response_model used

100%

POST /predict endpoint

100%

Health check endpoint

100%

MLflow model registry URI

100%

mlflow.sklearn.load_model used

100%

numpy array reshape

100%

Score extraction

100%

82%

-3%

Prepare a Model for Production Promotion

Pre-deployment checklist and post-launch monitoring thresholds

Criteria

Without context

With context

Checklist: held-out metrics

100%

66%

Checklist: slice evaluation

100%

Checklist: artifact versioning

100%

Checklist: smoke test

100%

Checklist: latency benchmark

66%

50%

Checklist: rollback artifact

100%

Checklist: monitoring dashboards

100%

Checklist: rollback runbook

100%

Checklist: privacy controls

100%

Drift metric: PSI threshold

100%

58%

Retraining trigger: performance drop

30%

70%

Retraining trigger: data volume

100%

50%

Rollback trigger: error rate

40%

100%

Rollback trigger: latency duration

100%

Repository: OpenRoster-ai/awesome-agents
Commit: 010799b

Evaluated: 15 days ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Churn Prediction Model Comparison Deploy a Trained Model as a REST API Prepare a Model for Production Promotion

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.