CtrlK
BlogDocsLog inGet started
Tessl Logo

ai-engineer

Trains and fine-tunes ML models, builds data preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference into production applications, and designs RAG and LLM-powered systems. Covers MLOps workflows including experiment tracking, drift detection, retraining triggers, and A/B testing. Use when the user asks about training or fine-tuning a model, building ML pipelines, model serving or inference optimization, evaluating model performance, working with frameworks like PyTorch, TensorFlow, scikit-learn, or Hugging Face, setting up vector databases, prompt engineering, or taking an ML prototype to production.

88

1.09x
Quality

88%

Does it follow best practices?

Impact

81%

1.09x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong, well-structured skill description that clearly articulates specific capabilities and includes an explicit 'Use when...' clause with rich trigger terms. Its main weakness is its extremely broad scope, covering nearly the entire ML lifecycle from data preprocessing to production deployment to LLM systems, which could create overlap with more specialized skills in a large skill library. The description uses appropriate third-person voice throughout.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: trains/fine-tunes models, builds preprocessing and feature engineering pipelines, deploys models as REST APIs, integrates inference, designs RAG/LLM systems, experiment tracking, drift detection, retraining triggers, and A/B testing.

3 / 3

Completeness

Clearly answers both 'what' (trains models, builds pipelines, deploys as REST APIs, designs RAG systems, covers MLOps workflows) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios like training models, building pipelines, working with specific frameworks, and taking prototypes to production.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'training', 'fine-tuning', 'ML pipelines', 'model serving', 'inference optimization', 'PyTorch', 'TensorFlow', 'scikit-learn', 'Hugging Face', 'vector databases', 'prompt engineering', 'ML prototype to production'. These are terms users would naturally use.

3 / 3

Distinctiveness Conflict Risk

While the ML/AI domain is well-defined, the scope is extremely broad—covering everything from data preprocessing to RAG systems to prompt engineering to MLOps. The 'prompt engineering' trigger could conflict with general LLM usage skills, and 'REST APIs' could overlap with web development skills. The breadth increases conflict risk with more specialized skills.

2 / 3

Total

11

/

12

Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides actionable guidance with executable code examples, clear multi-phase workflows with explicit validation gates, and appropriate progressive disclosure to supplementary files. The main weakness is moderate verbosity — some sections (mission, constraint explanations, overlapping checklist/checkpoint content) could be tightened to better respect token budget. Overall it serves as an effective operational guide for AI engineering tasks.

Suggestions

Trim the Mission section to 1 line or remove it — Claude doesn't need a philosophical framing of the role.

Consider consolidating the Validation Checkpoints and Deployment Checklist into a single artifact to reduce redundancy, or more clearly differentiate them with less explanatory prose.

DimensionReasoningScore

Conciseness

The content is generally efficient but has some areas of verbosity — the mission section restates things Claude already knows, and the deployment checklist + validation checkpoints have some overlap that could be tightened. The worked examples are useful but collectively make the file longer than necessary for a SKILL.md overview.

2 / 3

Actionability

The skill provides fully executable Python code for model serving (FastAPI), experiment tracking (MLflow), concrete metric thresholds (PSI > 0.2, p99 < 200ms), and a detailed evaluation summary format. All code examples are copy-paste ready with real libraries and realistic patterns.

3 / 3

Workflow Clarity

The workflow is clearly sequenced with explicit validation checkpoints at three stages (after training, before deployment, after launch). Each checkpoint has concrete pass/fail criteria, and the deployment checklist provides a comprehensive sign-off artifact. Feedback loops for drift detection and rollback triggers are well-defined.

3 / 3

Progressive Disclosure

The skill cleanly separates concerns with one-level-deep references to RAG_SYSTEMS.md, VECTOR_DATABASES.md, and FRAMEWORK_GUIDES.md. The main file serves as an effective overview with worked examples inline (appropriate for a skill file) and clear navigation via internal anchors and external references.

3 / 3

Total

11

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
OpenRoster-ai/awesome-agents
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.