model-evaluation-metrics

Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.

Install with Tessl CLI

npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill model-evaluation-metrics

What are skills?

Overall
score

19%

Review — 4%

Does it follow best practices?

Validation — 11 / 16 Passed

Validation for skill structure

SKILL.md

Review

Evals

Activation

This description is severely underdeveloped, functioning more as a category label than a useful skill description. It lacks any concrete actions, meaningful trigger terms, or explicit guidance on when to use it. The redundant trigger term and absence of specific ML metrics (accuracy, precision, recall, etc.) make it nearly useless for skill selection.

Suggestions

Add specific actions the skill performs, e.g., 'Calculates accuracy, precision, recall, F1 scores, and generates confusion matrices for trained models.'

Include a 'Use when...' clause with natural trigger terms: 'Use when evaluating model performance, checking accuracy, computing validation metrics, or analyzing prediction quality.'

Add common metric names and variations users would mention: 'accuracy', 'precision', 'recall', 'F1', 'AUC', 'ROC curve', 'confusion matrix', 'loss metrics', 'validation scores'.

Dimension	Reasoning	Score
Specificity	The description contains no concrete actions - it only states it's an 'auto-activating skill for ML Training' without describing what it actually does (e.g., calculate accuracy, generate confusion matrices, compute loss metrics).	1 / 3
Completeness	Fails to answer 'what does this do' (no actions described) and 'when should Claude use it' (no explicit use-when clause). Only provides category membership which is insufficient.	1 / 3
Trigger Term Quality	The trigger terms are redundant ('model evaluation metrics' repeated twice) and overly generic. Missing natural variations users would say like 'accuracy', 'precision', 'recall', 'F1 score', 'confusion matrix', 'loss', 'validation metrics'.	1 / 3
Distinctiveness Conflict Risk	The phrase 'model evaluation metrics' provides some domain specificity within ML, but without concrete actions or specific metric types, it could overlap with general ML skills or data analysis skills.	2 / 3
	Total	5 / 12 Passed

Implementation

This skill content is a placeholder template with no actual instructional value. It contains zero information about model evaluation metrics (accuracy, precision, recall, F1, AUC-ROC, etc.), no code examples, no formulas, and no concrete guidance. The entire content describes what the skill claims to do rather than actually doing it.

Suggestions

Add concrete code examples for common metrics (e.g., sklearn.metrics usage for classification/regression metrics with executable Python code)

Include a quick reference table of metrics with their formulas, use cases, and when to prefer each (e.g., precision vs recall tradeoffs)

Provide specific workflow steps for model evaluation: train/test split, cross-validation, metric calculation, and interpretation of results

Add examples showing metric calculation for different ML tasks (classification, regression, ranking) with sample inputs and expected outputs

Dimension	Reasoning	Score
Conciseness	The content is padded with generic boilerplate that provides no actual information about model evaluation metrics. Every section describes what the skill does abstractly rather than providing concrete guidance Claude could use.	1 / 3
Actionability	No concrete code, commands, formulas, or specific examples are provided. The content only describes capabilities in vague terms ('provides step-by-step guidance') without actually providing any guidance, metrics definitions, or executable code.	1 / 3
Workflow Clarity	No workflow is defined. There are no steps for calculating metrics, no validation checkpoints, and no actual process to follow. The skill merely states it 'provides step-by-step guidance' without including any steps.	1 / 3
Progressive Disclosure	The content is a flat structure with no references to detailed materials, no links to examples or API references, and no organization of content by complexity or use case. It's essentially an empty shell with no substance to disclose.	1 / 3
	Total	4 / 12 Passed

Validation

69%

Warnings & errors only

Validation — 11 / 16 Passed

Validation for skill structure

Criteria	Description	Result
description_trigger_hint	Description may be missing an explicit 'when to use' trigger hint (e.g., 'Use when...')	Warning
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
metadata_version	'metadata' field is not a dictionary	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
body_steps	No step-by-step structure detected (no ordered list); consider adding a simple workflow	Warning

	Total	11 / 16 Passed

Reviewed

22 days ago

Table of Contents

Activation Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.