CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

model-evaluation-metrics

Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.

Install with Tessl CLI

npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill model-evaluation-metrics
What are skills?

Overall
score

19%

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Activation

7%

This description is severely underdeveloped, functioning more as a category label than a useful skill description. It lacks any concrete actions, meaningful trigger terms, or explicit guidance on when to use it. The redundant trigger term and absence of specific ML metrics (accuracy, precision, recall, etc.) make it nearly useless for skill selection.

Suggestions

Add specific actions the skill performs, e.g., 'Calculates accuracy, precision, recall, F1 scores, and generates confusion matrices for trained models.'

Include a 'Use when...' clause with natural trigger terms: 'Use when evaluating model performance, checking accuracy, computing validation metrics, or analyzing prediction quality.'

Add common metric names and variations users would mention: 'accuracy', 'precision', 'recall', 'F1', 'AUC', 'ROC curve', 'confusion matrix', 'loss metrics', 'validation scores'.

DimensionReasoningScore

Specificity

The description contains no concrete actions - it only states it's an 'auto-activating skill for ML Training' without describing what it actually does (e.g., calculate accuracy, generate confusion matrices, compute loss metrics).

1 / 3

Completeness

Fails to answer 'what does this do' (no actions described) and 'when should Claude use it' (no explicit use-when clause). Only provides category membership which is insufficient.

1 / 3

Trigger Term Quality

The trigger terms are redundant ('model evaluation metrics' repeated twice) and overly generic. Missing natural variations users would say like 'accuracy', 'precision', 'recall', 'F1 score', 'confusion matrix', 'loss', 'validation metrics'.

1 / 3

Distinctiveness Conflict Risk

The phrase 'model evaluation metrics' provides some domain specificity within ML, but without concrete actions or specific metric types, it could overlap with general ML skills or data analysis skills.

2 / 3

Total

5

/

12

Passed

Implementation

0%

This skill content is a placeholder template with no actual instructional value. It contains zero information about model evaluation metrics (accuracy, precision, recall, F1, AUC-ROC, etc.), no code examples, no formulas, and no concrete guidance. The entire content describes what the skill claims to do rather than actually doing it.

Suggestions

Add concrete code examples for common metrics (e.g., sklearn.metrics usage for classification/regression metrics with executable Python code)

Include a quick reference table of metrics with their formulas, use cases, and when to prefer each (e.g., precision vs recall tradeoffs)

Provide specific workflow steps for model evaluation: train/test split, cross-validation, metric calculation, and interpretation of results

Add examples showing metric calculation for different ML tasks (classification, regression, ranking) with sample inputs and expected outputs

DimensionReasoningScore

Conciseness

The content is padded with generic boilerplate that provides no actual information about model evaluation metrics. Every section describes what the skill does abstractly rather than providing concrete guidance Claude could use.

1 / 3

Actionability

No concrete code, commands, formulas, or specific examples are provided. The content only describes capabilities in vague terms ('provides step-by-step guidance') without actually providing any guidance, metrics definitions, or executable code.

1 / 3

Workflow Clarity

No workflow is defined. There are no steps for calculating metrics, no validation checkpoints, and no actual process to follow. The skill merely states it 'provides step-by-step guidance' without including any steps.

1 / 3

Progressive Disclosure

The content is a flat structure with no references to detailed materials, no links to examples or API references, and no organization of content by complexity or use case. It's essentially an empty shell with no substance to disclose.

1 / 3

Total

4

/

12

Passed

Validation

69%

Validation11 / 16 Passed

Validation for skill structure

CriteriaDescriptionResult

description_trigger_hint

Description may be missing an explicit 'when to use' trigger hint (e.g., 'Use when...')

Warning

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

metadata_version

'metadata' field is not a dictionary

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

body_steps

No step-by-step structure detected (no ordered list); consider adding a simple workflow

Warning

Total

11

/

16

Passed

Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.