CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/pick-model

Recommend the optimal Claude model (Haiku/Sonnet/Opus) for a task using a decision matrix with complexity escalators.

80

Quality

80%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid description with clear 'what' and 'when' clauses, good trigger terms, and a distinctive niche. Its main weakness is that the capability description is somewhat thin — it could benefit from listing more specific actions like comparing cost/performance tradeoffs or providing reasoning for recommendations. Overall it performs well for skill selection purposes.

Suggestions

Add more specific concrete actions beyond 'recommend', such as 'Compares cost, speed, and capability tradeoffs between haiku, sonnet, and opus models' to improve specificity.

DimensionReasoningScore

Specificity

Names the domain (Claude model selection) and the core action (recommend optimal model), and mentions it covers tech and non-tech tasks, but doesn't list specific concrete actions beyond 'recommend'. Could mention comparing tradeoffs, estimating costs, or explaining reasoning.

2 / 3

Completeness

Clearly answers both 'what' (recommend optimal Claude model for a task) and 'when' (explicit 'Use when' clause with specific trigger phrases and a situational trigger for costly/complex tasks).

3 / 3

Trigger Term Quality

Includes natural trigger phrases users would actually say: 'which model', 'pick model', 'model for', plus the contextual trigger of 'costly/complex tasks'. These are realistic user phrasings and cover good variation.

3 / 3

Distinctiveness Conflict Risk

This is a very specific niche — model selection among haiku/sonnet/opus — with distinct trigger terms that are unlikely to conflict with other skills. The mention of specific model names further narrows the scope.

3 / 3

Total

11

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides excellent actionability with a well-structured decision matrix, clear escalator rules, and a concrete output format. However, it is significantly over-length — the examples, anti-patterns, philosophy, when-to-use/when-not-to-use, and usage examples sections are largely redundant with the decision matrix and add substantial token cost without proportional value. The content would benefit greatly from aggressive trimming, moving detailed examples and anti-patterns to the reference file.

Suggestions

Cut the Examples section entirely or reduce to 2-3 examples — the decision matrix tables already contain the same information in a more structured form.

Move Anti-Patterns, Philosophy, When to Use/When Not to Use, and Usage Examples sections to references/reference.md — these are supplementary context, not core decision logic.

Remove explanatory 'Why:' clauses in anti-patterns — Claude understands these implications without explanation.

Consolidate the four decision matrix category tables into two (Technical + Non-Technical) to reduce redundancy between Business/Creative/Commands categories.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~250+ lines. The decision matrix tables, extensive examples, anti-patterns, philosophy section, when-to-use/when-not-to-use sections, and usage examples are heavily redundant. The examples section largely restates what the decision matrix already covers. Claude doesn't need explanations of why hardcoding model names is bad or what benchmarks are.

1 / 3

Actionability

The decision matrix is concrete and directly actionable — given any task description, Claude can classify it against specific criteria and produce a recommendation. The output format template is copy-paste ready, escalator rules are specific, and the tie-breaking guidance is clear and executable.

3 / 3

Workflow Clarity

The 3-step workflow (parse → classify against matrix → output recommendation) is clear and unambiguous. The escalator system provides explicit upgrade logic with a clear cap. Decision guidance for uncertain cases adds a well-defined tiebreaker. For a classification/recommendation task, this is sufficient — no destructive operations require validation loops.

3 / 3

Progressive Disclosure

There is a reference to references/reference.md for extended content, which is good. However, the main file itself is bloated with content that could be split out — the extensive examples tables, anti-patterns, philosophy, and usage examples sections could live in reference files. The inline content is too heavy for an overview.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents