tessl i github:jeremylongshore/claude-code-plugins-plus-skills --skill evaluating-machine-learning-modelsBuild this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.
Validation
81%| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
metadata_version | 'metadata' field is not a dictionary | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 13 / 16 Passed | |
Implementation
7%This skill is a template-like document filled with generic boilerplate that provides no actionable guidance. It describes what the skill does conceptually but never shows how to actually use it - no real code, no actual command syntax, no concrete examples. The content wastes tokens explaining obvious concepts while failing to deliver the specific, executable instructions Claude needs.
Suggestions
Replace abstract descriptions with actual executable code showing how to invoke the model evaluation suite with specific parameters and expected output format
Remove generic sections like 'Prerequisites', 'Instructions', 'Error Handling' that contain only placeholder text with no real content
Provide concrete command syntax for '/eval-model' including all parameters, input formats, and example output JSON/data structures
Cut the 'Overview', 'How It Works', and 'When to Use' sections entirely - Claude doesn't need explanations of when to evaluate models
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with extensive padding and explanations of obvious concepts. Sections like 'How It Works', 'When to Use', and generic boilerplate ('This skill provides automated assistance...') explain things Claude already knows and waste tokens. | 1 / 3 |
Actionability | No executable code, no concrete commands, no actual implementation details. References '/eval-model' command but never shows syntax, parameters, or real usage. Examples describe what 'the skill will do' abstractly rather than providing copy-paste ready instructions. | 1 / 3 |
Workflow Clarity | Steps are vague and abstract ('Invoke the /eval-model command', 'Analyze the model's performance'). No validation checkpoints, no error recovery, no concrete sequence of actual commands or code to execute. | 1 / 3 |
Progressive Disclosure | Content is organized into sections with headers, but it's a monolithic document with no references to external files. The structure exists but contains too much inline content that could be split or removed entirely. | 2 / 3 |
Total | 5 / 12 Passed |
Activation
17%This description suffers from critical issues: it appears truncated mid-sentence, and the trigger guidance is entirely placeholder boilerplate text that provides no actual value. While it identifies the ML model evaluation domain, the lack of specific metrics, concrete actions, and meaningful trigger terms makes it nearly unusable for skill selection.
Suggestions
Replace the placeholder trigger text ('Use when appropriate context detected...') with specific trigger phrases like 'Use when user asks about model accuracy, precision, recall, F1 score, confusion matrix, ROC curves, or model validation'
Complete the truncated description and list specific metrics/actions: 'Calculate accuracy, precision, recall, F1 score, AUC-ROC, confusion matrices, and cross-validation scores'
Add natural user phrases as triggers: 'evaluate my model', 'check model performance', 'test accuracy', 'validate predictions', 'compare models'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (machine learning model evaluation) and mentions some actions like 'assess model accuracy', but the description is truncated ('p...') and uses vague phrases like 'comprehensive suite of metrics' without listing specific metrics or concrete actions. | 2 / 3 |
Completeness | The 'what' is partially addressed but truncated. The 'when' section is pure placeholder text ('Use when appropriate context detected. Trigger with relevant phrases based on skill purpose') that provides no actual guidance on when to use this skill. | 1 / 3 |
Trigger Term Quality | The 'Use when' clause is completely generic boilerplate ('appropriate context detected', 'relevant phrases based on skill purpose') with no actual trigger terms. While 'model performance analysis, validation, or testing' appear earlier, the trigger section provides zero natural keywords users would say. | 1 / 3 |
Distinctiveness Conflict Risk | The ML model evaluation domain is somewhat specific, but the generic trigger language and incomplete description could cause overlap with other data analysis or ML-related skills. Terms like 'model performance' could conflict with other analytics skills. | 2 / 3 |
Total | 6 / 12 Passed |
Reviewed
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.