Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.
Install with Tessl CLI
npx tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill model-evaluation-metricsOverall
score
19%
Does it follow best practices?
Validation for skill structure
Activation
7%This description is severely underdeveloped, functioning more as a category label than a useful skill description. It lacks any concrete actions, meaningful trigger terms, or explicit guidance on when to use it. The redundant trigger term and absence of specific ML metrics (accuracy, precision, recall, etc.) make it nearly useless for skill selection.
Suggestions
Add specific actions the skill performs, e.g., 'Calculates accuracy, precision, recall, F1 scores, and generates confusion matrices for trained models.'
Include a 'Use when...' clause with natural trigger terms: 'Use when evaluating model performance, checking accuracy, computing validation metrics, or analyzing prediction quality.'
Add common metric names and variations users would mention: 'accuracy', 'precision', 'recall', 'F1', 'AUC', 'ROC curve', 'confusion matrix', 'loss metrics', 'validation scores'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions - it only states it's an 'auto-activating skill for ML Training' without describing what it actually does (e.g., calculate accuracy, generate confusion matrices, compute loss metrics). | 1 / 3 |
Completeness | Fails to answer 'what does this do' (no actions described) and 'when should Claude use it' (no explicit use-when clause). Only provides category membership which is insufficient. | 1 / 3 |
Trigger Term Quality | The trigger terms are redundant ('model evaluation metrics' repeated twice) and overly generic. Missing natural variations users would say like 'accuracy', 'precision', 'recall', 'F1 score', 'confusion matrix', 'loss', 'validation metrics'. | 1 / 3 |
Distinctiveness Conflict Risk | The phrase 'model evaluation metrics' provides some domain specificity within ML, but without concrete actions or specific metric types, it could overlap with general ML skills or data analysis skills. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
0%This skill content is a placeholder template with no actual instructional value. It contains zero information about model evaluation metrics (accuracy, precision, recall, F1, AUC-ROC, etc.), no code examples, no formulas, and no concrete guidance. The entire content describes what the skill claims to do rather than actually doing it.
Suggestions
Add concrete code examples for common metrics (e.g., sklearn.metrics usage for classification/regression metrics with executable Python code)
Include a quick reference table of metrics with their formulas, use cases, and when to prefer each (e.g., precision vs recall tradeoffs)
Provide specific workflow steps for model evaluation: train/test split, cross-validation, metric calculation, and interpretation of results
Add examples showing metric calculation for different ML tasks (classification, regression, ranking) with sample inputs and expected outputs
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is padded with generic boilerplate that provides no actual information about model evaluation metrics. Every section describes what the skill does abstractly rather than providing concrete guidance Claude could use. | 1 / 3 |
Actionability | No concrete code, commands, formulas, or specific examples are provided. The content only describes capabilities in vague terms ('provides step-by-step guidance') without actually providing any guidance, metrics definitions, or executable code. | 1 / 3 |
Workflow Clarity | No workflow is defined. There are no steps for calculating metrics, no validation checkpoints, and no actual process to follow. The skill merely states it 'provides step-by-step guidance' without including any steps. | 1 / 3 |
Progressive Disclosure | The content is a flat structure with no references to detailed materials, no links to examples or API references, and no organization of content by complexity or use case. It's essentially an empty shell with no substance to disclose. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
69%Validation — 11 / 16 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
description_trigger_hint | Description may be missing an explicit 'when to use' trigger hint (e.g., 'Use when...') | Warning |
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
metadata_version | 'metadata' field is not a dictionary | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
body_steps | No step-by-step structure detected (no ordered list); consider adding a simple workflow | Warning |
Total | 11 / 16 Passed | |
Reviewed
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.