CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

evaluating-machine-learning-models

tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill evaluating-machine-learning-models

Build this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

27%

Overall

SKILL.md
Review
Evals

Validation

81%
CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

metadata_version

'metadata' field is not a dictionary

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

13

/

16

Passed

Implementation

7%

This skill is a template-like document filled with generic boilerplate that provides no actionable guidance. It describes what the skill does conceptually but never shows how to actually use it - no real code, no actual command syntax, no concrete examples. The content wastes tokens explaining obvious concepts while failing to deliver the specific, executable instructions Claude needs.

Suggestions

Replace abstract descriptions with actual executable code showing how to invoke the model evaluation suite with specific parameters and expected output format

Remove generic sections like 'Prerequisites', 'Instructions', 'Error Handling' that contain only placeholder text with no real content

Provide concrete command syntax for '/eval-model' including all parameters, input formats, and example output JSON/data structures

Cut the 'Overview', 'How It Works', and 'When to Use' sections entirely - Claude doesn't need explanations of when to evaluate models

DimensionReasoningScore

Conciseness

Extremely verbose with extensive padding and explanations of obvious concepts. Sections like 'How It Works', 'When to Use', and generic boilerplate ('This skill provides automated assistance...') explain things Claude already knows and waste tokens.

1 / 3

Actionability

No executable code, no concrete commands, no actual implementation details. References '/eval-model' command but never shows syntax, parameters, or real usage. Examples describe what 'the skill will do' abstractly rather than providing copy-paste ready instructions.

1 / 3

Workflow Clarity

Steps are vague and abstract ('Invoke the /eval-model command', 'Analyze the model's performance'). No validation checkpoints, no error recovery, no concrete sequence of actual commands or code to execute.

1 / 3

Progressive Disclosure

Content is organized into sections with headers, but it's a monolithic document with no references to external files. The structure exists but contains too much inline content that could be split or removed entirely.

2 / 3

Total

5

/

12

Passed

Activation

17%

This description suffers from critical issues: it appears truncated mid-sentence, and the trigger guidance is entirely placeholder boilerplate text that provides no actual value. While it identifies the ML model evaluation domain, the lack of specific metrics, concrete actions, and meaningful trigger terms makes it nearly unusable for skill selection.

Suggestions

Replace the placeholder trigger text ('Use when appropriate context detected...') with specific trigger phrases like 'Use when user asks about model accuracy, precision, recall, F1 score, confusion matrix, ROC curves, or model validation'

Complete the truncated description and list specific metrics/actions: 'Calculate accuracy, precision, recall, F1 score, AUC-ROC, confusion matrices, and cross-validation scores'

Add natural user phrases as triggers: 'evaluate my model', 'check model performance', 'test accuracy', 'validate predictions', 'compare models'

DimensionReasoningScore

Specificity

Names the domain (machine learning model evaluation) and mentions some actions like 'assess model accuracy', but the description is truncated ('p...') and uses vague phrases like 'comprehensive suite of metrics' without listing specific metrics or concrete actions.

2 / 3

Completeness

The 'what' is partially addressed but truncated. The 'when' section is pure placeholder text ('Use when appropriate context detected. Trigger with relevant phrases based on skill purpose') that provides no actual guidance on when to use this skill.

1 / 3

Trigger Term Quality

The 'Use when' clause is completely generic boilerplate ('appropriate context detected', 'relevant phrases based on skill purpose') with no actual trigger terms. While 'model performance analysis, validation, or testing' appear earlier, the trigger section provides zero natural keywords users would say.

1 / 3

Distinctiveness Conflict Risk

The ML model evaluation domain is somewhat specific, but the generic trigger language and incomplete description could cause overlap with other data analysis or ML-related skills. Terms like 'model performance' could conflict with other analytics skills.

2 / 3

Total

6

/

12

Passed

Reviewed

Table of Contents

ValidationImplementationActivation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.