CtrlK
BlogDocsLog inGet started
Tessl Logo

evaluating-machine-learning-models

Build this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

34

Quality

20%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/ai-ml/model-evaluation-suite/skills/evaluating-machine-learning-models/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description suffers from truncation (the capabilities list is cut off at 'accuracy, p...'), uses first/second person framing ('this allows AI assistant'), and ends with meaningless boilerplate filler ('Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.'). The core idea of ML model evaluation is identifiable but poorly articulated, with no specific metrics, file types, or concrete actions listed.

Suggestions

Complete the truncated capabilities list with specific metrics and actions (e.g., 'Computes accuracy, precision, recall, F1 score, ROC-AUC, confusion matrices, and cross-validation scores for classification and regression models').

Replace the generic boilerplate 'Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.' with explicit trigger guidance like 'Use when the user asks to evaluate, validate, or benchmark a trained ML model, or mentions terms like confusion matrix, F1 score, precision-recall, or model metrics.'

Rewrite in third person active voice (e.g., 'Evaluates machine learning models...') and remove the 'Build this skill allows AI assistant' preamble.

DimensionReasoningScore

Specificity

The description mentions 'evaluate machine learning models' and 'comprehensive suite of metrics' but is extremely vague about what those metrics or actions actually are. 'Assess model accuracy, p...' is truncated and the rest is filler language with no concrete actions listed.

1 / 3

Completeness

The 'what' is partially addressed (evaluate ML models) but truncated and incomplete. The 'when' clause exists ('when the user requests model performance analysis, validation, or testing') but the appended generic filler ('Use when appropriate context detected') is meaningless boilerplate that doesn't add real trigger guidance.

2 / 3

Trigger Term Quality

Contains some relevant keywords like 'model performance analysis', 'validation', 'testing', and 'model accuracy' that users might naturally say, but the generic filler at the end ('Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.') adds no useful trigger terms and many common variations are missing (e.g., 'confusion matrix', 'F1 score', 'ROC', 'AUC', 'precision', 'recall').

2 / 3

Distinctiveness Conflict Risk

The ML model evaluation domain is somewhat specific, but the vague language and truncated content could overlap with general data analysis or ML training skills. The generic filler at the end further reduces distinctiveness.

2 / 3

Total

7

/

12

Passed

Implementation

0%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is almost entirely generic boilerplate with no actionable content. It references a `model-evaluation-suite` plugin and `/eval-model` command but never shows how to actually use them—no code, no real commands, no concrete parameters. The majority of sections (Error Handling, Output, Instructions, Resources) contain placeholder text that provides zero value.

Suggestions

Replace the abstract workflow description with actual executable code showing how to invoke model evaluation (e.g., Python code using sklearn.metrics or the referenced plugin with real parameters and expected output format).

Remove all generic boilerplate sections (Error Handling, Output, Resources, Instructions) that contain only placeholder text, and remove the 'Overview' and 'When to Use' sections that explain concepts Claude already knows.

Provide concrete examples with actual input data structures and expected output formats (e.g., show a confusion matrix, classification report, or specific metric calculations with sample code).

If the `model-evaluation-suite` plugin is real, document its actual API: specific commands, required parameters, configuration options, and output schema with a complete working example.

DimensionReasoningScore

Conciseness

Extremely verbose with extensive padding. Explains obvious concepts Claude already knows (what model evaluation is, when to use it), includes generic boilerplate sections like 'Error Handling', 'Output', 'Resources', and 'Instructions' that add no actionable value. The 'Overview' paragraph is pure filler.

1 / 3

Actionability

No executable code, no concrete commands, no actual implementation details. References a `/eval-model` command and `model-evaluation-suite` plugin without showing how to actually use them. Examples describe what 'the skill will do' in abstract terms rather than providing concrete steps or code. The 'Instructions' section is entirely generic ('Invoke this skill when trigger conditions are met').

1 / 3

Workflow Clarity

The workflow steps are vague abstractions ('Analyzing Context', 'Executing Evaluation', 'Presenting Results') with no concrete details. No validation checkpoints, no error recovery loops, no specific commands or parameters. The examples describe outcomes without showing actual steps to achieve them.

1 / 3

Progressive Disclosure

Monolithic wall of text with no references to external files. Multiple sections contain generic filler content that could be removed entirely. The 'Resources' section lists 'Project documentation' and 'Related skills and commands' without any actual links or file references.

1 / 3

Total

4

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.