CtrlK
BlogDocsLog inGet started
Tessl Logo

model-evaluation-metrics

Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.

32

1.00x
Quality

0%

Does it follow best practices?

Impact

92%

1.00x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./planned-skills/generated/07-ml-training/model-evaluation-metrics/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

0%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an extremely weak description that essentially just restates its title with no concrete actions, no meaningful trigger terms, and no explicit guidance on when to use it. The trigger terms are duplicated rather than varied, and the description reads as auto-generated boilerplate rather than a useful skill selector.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Calculates precision, recall, F1 score, AUC-ROC, and confusion matrices for trained models. Compares model performance across experiments.'

Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about accuracy, precision, recall, F1, confusion matrix, ROC curve, model performance, or evaluation results.'

Remove the duplicated trigger term and replace with diverse natural language variations users would actually say, such as 'model accuracy', 'how well does my model perform', 'classification report', 'loss curves', etc.

DimensionReasoningScore

Specificity

The description names a domain ('model evaluation metrics') but lists no concrete actions. There are no specific capabilities like 'calculate precision/recall', 'generate confusion matrices', or 'compare model performance'. It is essentially a label, not a description of what the skill does.

1 / 3

Completeness

The description fails to answer 'what does this do' beyond naming the topic, and the 'when' clause is just a redundant restatement of the title. There is no explicit 'Use when...' guidance with meaningful triggers.

1 / 3

Trigger Term Quality

The trigger terms listed are just 'model evaluation metrics' repeated twice. There are no natural keyword variations users might say such as 'accuracy', 'precision', 'recall', 'F1 score', 'confusion matrix', 'ROC curve', 'AUC', 'loss function', etc.

1 / 3

Distinctiveness Conflict Risk

The description is so vague that it could overlap with any ML-related skill. 'Model evaluation metrics' without specifying which metrics, what actions, or what context makes it indistinguishable from other ML training or evaluation skills.

1 / 3

Total

4

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is an empty shell with no actual content. It consists entirely of auto-generated boilerplate that describes what the skill would do without providing any actionable information about model evaluation metrics. There is no code, no specific metrics discussed, no formulas, no examples, and no workflows.

Suggestions

Add concrete, executable code examples for computing common evaluation metrics (accuracy, precision, recall, F1, AUC-ROC) using sklearn or PyTorch

Include a workflow for model evaluation: train/val/test split → compute metrics → interpret results → iterate, with specific validation checkpoints

Remove all meta-description sections ('Purpose', 'When to Use', 'Example Triggers') and replace with actual technical content such as metric selection guidance, code snippets, and common pitfalls

Add a quick-reference table mapping task types (classification, regression, ranking) to appropriate metrics with one-liner code examples for each

DimensionReasoningScore

Conciseness

The content is entirely filler and meta-description. It explains what the skill does in abstract terms without providing any actual knowledge or instructions. Every section restates the same vague concept ('model evaluation metrics') without adding substance.

1 / 3

Actionability

There is zero concrete guidance—no code, no commands, no specific metrics, no formulas, no examples of computing precision/recall/F1/AUC or any other evaluation metric. It describes rather than instructs.

1 / 3

Workflow Clarity

No workflow or steps are defined. The 'step-by-step guidance' is merely claimed in a bullet point but never actually provided. There are no sequences, validation checkpoints, or processes of any kind.

1 / 3

Progressive Disclosure

The content has section headers but they contain no meaningful information—just repeated boilerplate. There are no references to detailed files, no navigation structure, and no actual content to disclose progressively.

1 / 3

Total

4

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.