CtrlK
BlogDocsLog inGet started
Tessl Logo

evaluating-machine-learning-models

Build this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

46

Quality

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is well-sectioned and conveys a coherent evaluation workflow, but it relies on generic template filler, never provides executable commands or code, omits validation checkpoints, and fails to surface the real bundle files it ships with. It sits at the mid level on every dimension.

Suggestions

Replace the generic 'Instructions', 'Output', 'Error Handling', and 'Prerequisites' filler with a concrete, runnable example of the `/eval-model` command including arguments and expected output.

Add an explicit validation/feedback step to the workflow (e.g., run evaluation, inspect metrics, re-run with adjusted parameters on failure) so the sequence has checkpoints.

Link the bundled scripts and assets from the body (e.g., 'See scripts/evaluate_model.py for the evaluation runner; assets/visualization_script.py for metric plots') instead of the placeholder 'Resources' list.

DimensionReasoningScore

Conciseness

The body avoids explaining basic ML concepts but is padded with template filler that adds no value ('The skill produces structured output relevant to the task.', 'Invoke this skill when the trigger conditions are met', 'Appropriate file access permissions'), fitting 'mostly efficient but includes some unnecessary explanation or could be tightened'.

2 / 3

Actionability

It names a concrete command (`/eval-model`) but never shows it with arguments or executable code, and examples only describe steps ('Invoke the `/eval-model` command', 'Analyze the model's performance') rather than giving copy-paste-ready guidance, matching 'some concrete guidance but incomplete; missing key details'.

2 / 3

Workflow Clarity

The 'How It Works' section provides a 3-step sequence and examples add numbered steps, but there are no validation checkpoints or error-recovery feedback loops, fitting 'steps listed but validation gaps; sequence present but checkpoints missing or implicit'.

2 / 3

Progressive Disclosure

Bundle files exist (assets/visualization_script.py, scripts/evaluate_model.py, metrics_calculator.py, data_loader.py) but the body never links to or navigates them — the 'Resources' section only lists generic 'Project documentation', and inline template content that belongs in references is kept in SKILL.md, matching 'some structure but could be better organized; references present but not clearly signaled'.

2 / 3

Total

8

/

12

Passed

Description

50%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description communicates a clear ML-evaluation purpose with third-person voice and some natural trigger terms, but is undermined by a truncated/garbled fragment and generic trigger boilerplate. It lands at the mid level across all dimensions rather than achieving crisp, explicit, conflict-free trigger guidance.

Suggestions

Replace the boilerplate 'Use when appropriate context detected. Trigger with relevant phrases based on skill purpose' with concrete user phrasings such as 'Use when the user asks to evaluate, benchmark, or validate a model, or to compare model accuracy/F1/precision/recall.'

Fix the truncated 'assess model accuracy, p...' fragment and list the concrete metrics (accuracy, precision, recall, F1-score) instead of 'comprehensive suite of metrics'.

Tighten the opening 'Build this skill allows AI assistant to...' to a clean third-person action statement like 'Evaluates machine learning models using accuracy, precision, recall, and F1 metrics.'

DimensionReasoningScore

Specificity

Names the ML domain and some actions ('evaluate machine learning models', 'assess model accuracy, p...'), but the action list is truncated ('p...') and padded with vague phrasing like 'comprehensive suite of metrics' rather than a comprehensive enumeration, matching 'names domain and some actions, but not comprehensive'.

2 / 3

Completeness

It states what the skill does and offers a partial when ('it should be used when the user requests model performance analysis, validation, or testing'), but the explicit trigger is undercut by generic boilerplate and a garbled truncation ('p...'), so it does not 'clearly answer both what AND when with explicit triggers' required for a 3.

2 / 3

Trigger Term Quality

Includes some natural terms ('model performance analysis, validation, or testing') but then degenerates into boilerplate ('Use when appropriate context detected. Trigger with relevant phrases based on skill purpose') that are not phrases users would say, fitting 'some relevant keywords but missing common variations'.

2 / 3

Distinctiveness Conflict Risk

The 'evaluate machine learning models' niche is reasonably distinct, yet the generic trigger boilerplate ('Use when appropriate context detected') broadens the activation surface and risks overlap with other ML/testing skills, matching 'somewhat specific but could still overlap with similar skills'.

2 / 3

Total

8

/

12

Passed

Validation

87%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation14 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

14

/

16

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.