Automatically applies when choosing LLM models and providers. Ensures proper model comparison, provider selection, cost optimization, fallback patterns, and multi-model strategies.
77
54%
Does it follow best practices?
Impact
93%
1.25xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./ai-llm/model-selection/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description adequately communicates its domain and includes an explicit 'when' clause, which is a strength. However, the listed capabilities read more like category headings than concrete actions, and the trigger terms could be expanded to include natural user language variations (e.g., 'which model to use', 'API pricing', specific provider names). The description is functional but could be more specific and keyword-rich.
Suggestions
Add more natural trigger terms users would actually say, such as 'which model should I use', 'OpenAI vs Anthropic', 'GPT', 'token pricing', 'API costs', 'rate limits'.
Make capabilities more concrete by replacing category labels with specific actions, e.g., 'compare token pricing across providers, configure fallback chains when primary models are unavailable, select optimal models based on task requirements and budget constraints'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (LLM models and providers) and lists some actions (model comparison, provider selection, cost optimization, fallback patterns, multi-model strategies), but these are more like category labels than concrete specific actions. For example, 'cost optimization' is vague compared to something like 'compare token pricing across providers' or 'calculate cost per request'. | 2 / 3 |
Completeness | Clearly answers both 'what' (model comparison, provider selection, cost optimization, fallback patterns, multi-model strategies) and 'when' ('Automatically applies when choosing LLM models and providers'). The trigger condition is explicitly stated upfront. | 3 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'LLM models', 'providers', 'cost optimization', 'fallback patterns', and 'multi-model strategies'. However, it misses many natural user terms like 'OpenAI', 'Anthropic', 'GPT', 'API', 'token pricing', 'rate limits', 'which model should I use', 'cheapest model', etc. | 2 / 3 |
Distinctiveness Conflict Risk | The domain of LLM model/provider selection is reasonably specific, but terms like 'cost optimization' and 'multi-model strategies' could overlap with general architecture or infrastructure skills. The niche is identifiable but not sharply delineated with unique trigger terms. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides highly actionable, executable Python code for model selection and provider management, but is severely over-engineered for a SKILL.md file. The content is a monolithic ~500-line document that includes complete class implementations with full docstrings and type hints—content that Claude can generate on its own. The time-sensitive pricing data will become stale, and the lack of progressive disclosure means this consumes significant context window for what could be conveyed in a fraction of the space.
Suggestions
Reduce the content to key patterns and interfaces (50-100 lines max), moving full implementations to separate reference files like MODEL_REGISTRY.md, ROUTING.md, FALLBACK.md
Remove or move pricing data to a separate, clearly-dated reference file since specific dollar amounts will become stale quickly
Strip docstrings and verbose inline comments from code examples—Claude knows Python conventions and can infer parameter purposes from type hints
Add validation checkpoints to the Auto-Apply workflow, e.g., 'Test fallback chain with simulated failures before deploying' and 'Verify routing rules against sample prompts'
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. The ModelRegistry, CostOptimizer, and ModelEnsemble classes are fully spelled out with extensive docstrings, type hints, and inline comments that Claude already knows how to write. The pricing data is time-sensitive and will become stale. Much of this could be condensed to patterns and key interfaces rather than complete implementations. | 1 / 3 |
Actionability | All code is fully executable Python with complete class definitions, type annotations, and usage examples. The code is copy-paste ready with concrete implementations for registry, routing, fallback chains, cost optimization, and ensembles. | 3 / 3 |
Workflow Clarity | The Auto-Apply section provides a 7-step sequence, but there are no validation checkpoints or feedback loops. For operations involving model selection and cost optimization in production, there's no guidance on verifying that routing rules work correctly or that fallback chains are functioning before deployment. | 2 / 3 |
Progressive Disclosure | The entire skill is a monolithic wall of code with no content split into separate files. Hundreds of lines of implementation detail (full class definitions for ModelRegistry, ModelRouter, FallbackChain, CostOptimizer, ModelEnsemble) are all inline when they should be referenced as separate files. The Related Skills section exists but the main content itself is not appropriately structured. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (713 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
6770aaa
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.