CtrlK
BlogDocsLog inGet started
Tessl Logo

ai-engineer

Trains and fine-tunes ML models, builds data preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference into production applications, and designs RAG and LLM-powered systems. Covers MLOps workflows including experiment tracking, drift detection, retraining triggers, and A/B testing. Use when the user asks about training or fine-tuning a model, building ML pipelines, model serving or inference optimization, evaluating model performance, working with frameworks like PyTorch, TensorFlow, scikit-learn, or Hugging Face, setting up vector databases, prompt engineering, or taking an ML prototype to production.

90

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description that excels in specificity and completeness, with a well-structured 'Use when...' clause containing numerous natural trigger terms. Its main weakness is its extremely broad scope—covering nearly the entire ML lifecycle from data preprocessing to production deployment to LLM systems—which could create overlap with more specialized skills in a large skill library. The description uses proper third-person voice throughout.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: trains/fine-tunes ML models, builds preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference into production, designs RAG and LLM-powered systems, experiment tracking, drift detection, retraining triggers, and A/B testing.

3 / 3

Completeness

Clearly answers both 'what' (trains models, builds pipelines, deploys as REST APIs, designs RAG systems, covers MLOps workflows) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios like training a model, building ML pipelines, working with specific frameworks, or taking prototypes to production.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'training', 'fine-tuning', 'ML pipelines', 'model serving', 'inference optimization', 'PyTorch', 'TensorFlow', 'scikit-learn', 'Hugging Face', 'vector databases', 'prompt engineering', 'ML prototype to production'. These are terms users would naturally use when seeking ML help.

3 / 3

Distinctiveness Conflict Risk

While the ML/MLOps focus is fairly specific, the scope is extremely broad—covering everything from data preprocessing to prompt engineering to RAG systems to deployment. The 'prompt engineering' trigger could easily conflict with general LLM/coding skills, and 'vector databases' could overlap with database skills. The breadth increases conflict risk with more specialized skills.

2 / 3

Total

11

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides actionable guidance across the ML lifecycle. Its greatest strengths are the concrete worked examples (evaluation summary, serving template, experiment tracking, deployment checklist) and the explicit validation checkpoints with measurable thresholds. Minor verbosity in the mission/workflow preamble and some explanatory text around the checkpoints could be trimmed, but overall token efficiency is reasonable given the breadth of the skill.

DimensionReasoningScore

Conciseness

The content is generally efficient but includes some sections that could be tightened. The mission statement and some workflow descriptions are slightly verbose for what Claude already knows. However, the examples and checklists earn their space. The deployment checklist and evaluation summary, while long, provide genuinely useful templates.

2 / 3

Actionability

The skill provides fully executable Python code for model serving (FastAPI), experiment tracking (MLflow), concrete metric thresholds (PSI > 0.2, p99 < 200ms), and copy-paste ready templates. The evaluation summary example and deployment checklist are specific and immediately usable.

3 / 3

Workflow Clarity

The workflow is clearly sequenced with explicit validation checkpoints at three stages (after training, before deployment, after launch). Each checkpoint has concrete go/no-go criteria with specific thresholds. The relationship between per-phase gates and the final deployment checklist is explicitly clarified, and rollback/retraining triggers are defined.

3 / 3

Progressive Disclosure

The skill cleanly separates concerns with one-level-deep references to RAG_SYSTEMS.md, VECTOR_DATABASES.md, and FRAMEWORK_GUIDES.md. The main content stays focused on the core workflow and examples, with clear internal navigation via anchor links between validation checkpoints and the deployment checklist.

3 / 3

Total

11

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
OpenRoster-ai/awesome-agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.