Automatically applies when choosing LLM models and providers. Ensures proper model comparison, provider selection, cost optimization, fallback patterns, and multi-model strategies.
78
Quality
57%
Does it follow best practices?
Impact
93%
1.25xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/ai-llm/model-selection/SKILL.mdQuality
Discovery
50%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description establishes a clear domain around LLM model and provider selection but relies on abstract capability categories rather than concrete actions. The 'Automatically applies when' phrasing is circular and doesn't provide the explicit trigger guidance needed for reliable skill selection. The description would benefit from more natural user-facing keywords and specific use case examples.
Suggestions
Add explicit trigger guidance with natural phrases users would say, e.g., 'Use when the user asks which model to use, compares API providers, discusses token pricing, or needs help with OpenAI/Anthropic/Claude/GPT selection'
Replace abstract categories with concrete actions, e.g., 'Compare token costs across providers, configure fallback chains when APIs fail, select optimal models for specific tasks'
Include common variations and brand names users might mention: 'GPT-4', 'Claude', 'Gemini', 'API costs', 'rate limiting', 'model benchmarks'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (LLM models/providers) and lists some actions (model comparison, provider selection, cost optimization, fallback patterns, multi-model strategies), but these are somewhat abstract categories rather than concrete specific actions like 'compare token costs' or 'configure retry logic'. | 2 / 3 |
Completeness | The 'what' is addressed with the list of capabilities, but the 'when' clause ('Automatically applies when choosing LLM models and providers') is vague and circular - it doesn't provide explicit trigger guidance or user-facing scenarios that would help Claude distinguish when to select this skill. | 2 / 3 |
Trigger Term Quality | Includes relevant terms like 'LLM models', 'providers', 'cost optimization', and 'fallback patterns', but misses common natural variations users might say such as 'which model should I use', 'API pricing', 'OpenAI vs Anthropic', 'rate limits', or 'model selection'. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on LLM models and providers creates a reasonable niche, but terms like 'cost optimization' and 'multi-model strategies' could overlap with general architecture or cost management skills. The lack of specific file types or explicit trigger terms increases conflict risk. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides highly actionable, production-ready code for model selection and provider management with comprehensive patterns for routing, fallbacks, and cost optimization. However, it's verbose for a skill file—the complete class implementations with full type hints and docstrings could be condensed or moved to reference files. The workflow guidance lacks explicit validation steps for testing the model selection infrastructure.
Suggestions
Add explicit validation steps in the Auto-Apply workflow, such as 'Test fallback chain with simulated failures before deployment' and 'Verify routing rules match expected models for sample prompts'
Move complete class implementations to a separate REFERENCE.md or EXAMPLES.md file, keeping only concise pattern summaries and key code snippets in the main skill
Reduce verbosity by removing obvious docstrings and comments that explain what Claude already knows (e.g., 'Supported LLM providers', 'Model capabilities and constraints')
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill provides extensive code examples that are useful, but includes some redundancy (e.g., full Pydantic models with docstrings Claude already understands, verbose type hints throughout). The content could be tightened by showing patterns more concisely rather than complete implementations. | 2 / 3 |
Actionability | Excellent actionability with fully executable Python code throughout. The ModelRegistry, ModelRouter, FallbackChain, CostOptimizer, and ModelEnsemble classes are complete, copy-paste ready implementations with clear usage examples. | 3 / 3 |
Workflow Clarity | The Auto-Apply section provides a 7-step workflow, but lacks explicit validation checkpoints. For operations involving model selection and fallback chains, there's no guidance on verifying the setup works correctly before production use or testing fallback behavior. | 2 / 3 |
Progressive Disclosure | The skill references related skills at the end but presents all content inline in one large file. The extensive code examples (ModelRegistry, Router, FallbackChain, CostOptimizer, Ensemble) could be split into separate reference files with the main skill providing a concise overview. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (713 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
6213d1a
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.