ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.
87
78%
Does it follow best practices?
Impact
93%
1.57xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-ml-engineer/SKILL.mdProduction ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.
Deploy a trained model to production with monitoring:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]| Option | Latency | Throughput | Use Case |
|---|---|---|---|
| FastAPI + Uvicorn | Low | Medium | REST APIs, small models |
| Triton Inference Server | Very Low | Very High | GPU inference, batching |
| TensorFlow Serving | Low | High | TensorFlow models |
| TorchServe | Low | High | PyTorch models |
| Ray Serve | Medium | High | Complex pipelines, multi-model |
Establish automated training and deployment:
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)| Trigger | Detection | Action |
|---|---|---|
| Scheduled | Cron (weekly/monthly) | Full retrain |
| Performance drop | Accuracy < threshold | Immediate retrain |
| Data drift | PSI > 0.2 | Evaluate, then retrain |
| New data volume | X new samples | Incremental update |
Integrate LLM APIs into production applications:
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)| Provider | Input Cost | Output Cost |
|---|---|---|
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-3.5 | $0.0005/1K | $0.0015/1K |
| Claude 3 Opus | $0.015/1K | $0.075/1K |
| Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
Build retrieval-augmented generation pipeline:
| Database | Hosting | Scale | Latency | Best For |
|---|---|---|---|---|
| Pinecone | Managed | High | Low | Production, managed |
| Qdrant | Both | High | Very Low | Performance-critical |
| Weaviate | Both | High | Low | Hybrid search |
| Chroma | Self-hosted | Medium | Low | Prototyping |
| pgvector | Self-hosted | Medium | Medium | Existing Postgres |
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed | 500-1000 tokens | 50-100 | General text |
| Sentence | 3-5 sentences | 1 sentence | Structured text |
| Semantic | Variable | Based on meaning | Research papers |
| Recursive | Hierarchical | Parent-child | Long documents |
Monitor production models for drift and degradation:
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}| Metric | Warning | Critical |
|---|---|---|
| p95 latency | > 100ms | > 200ms |
| Error rate | > 0.1% | > 1% |
| PSI (drift) | > 0.1 | > 0.2 |
| Accuracy drop | > 2% | > 5% |
references/mlops_production_patterns.md contains:
references/llm_integration_guide.md contains:
references/rag_system_architecture.md contains:
python scripts/model_deployment_pipeline.py --model model.pkl --target stagingGenerates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.
python scripts/rag_system_builder.py --config rag_config.yaml --analyzeScaffolds RAG pipeline with vector store integration and retrieval logic.
python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploySets up drift detection, alerting, and performance dashboards.
| Category | Tools |
|---|---|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| MLOps | MLflow, Weights & Biases, Kubeflow |
| Data | Spark, Airflow, dbt, Kafka |
| Deployment | Docker, Kubernetes, Triton |
| Databases | PostgreSQL, BigQuery, Pinecone, Redis |
a96cc20
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.