Automatically applies when choosing LLM models and providers. Ensures proper model comparison, provider selection, cost optimization, fallback patterns, and multi-model strategies.
77
54%
Does it follow best practices?
Impact
93%
1.25xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./ai-llm/model-selection/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description adequately communicates its domain and includes an explicit 'when' clause, which is a strength. However, the listed capabilities read more like high-level categories than concrete actions, and the trigger terms lack the natural language variations users would actually use when seeking help with model selection. It would benefit from more specific actions and richer keyword coverage.
Suggestions
Add more specific concrete actions, e.g., 'compare token pricing across providers, select optimal models for latency vs cost tradeoffs, configure retry and fallback chains'.
Include natural user trigger terms such as 'which model to use', 'OpenAI vs Anthropic', 'GPT', 'API costs', 'token pricing', 'rate limits', 'model benchmarks'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (LLM models and providers) and lists some actions (model comparison, provider selection, cost optimization, fallback patterns, multi-model strategies), but these are more like category labels than concrete specific actions. For example, 'cost optimization' is vague compared to something like 'compare token pricing across providers' or 'calculate cost per request'. | 2 / 3 |
Completeness | Clearly answers both 'what' (model comparison, provider selection, cost optimization, fallback patterns, multi-model strategies) and 'when' ('Automatically applies when choosing LLM models and providers'). The trigger condition is explicitly stated upfront. | 3 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'LLM models', 'providers', 'cost optimization', 'fallback patterns', and 'multi-model strategies'. However, it misses many natural user terms like 'OpenAI', 'Anthropic', 'GPT', 'API', 'token pricing', 'rate limits', 'which model should I use', 'cheapest model', etc. | 2 / 3 |
Distinctiveness Conflict Risk | The domain of LLM model/provider selection is reasonably specific, but terms like 'cost optimization' and 'multi-model strategies' could overlap with general architecture or infrastructure skills. The niche is identifiable but not sharply delineated with unique trigger terms. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides highly actionable, executable code covering model selection, routing, fallback, cost optimization, and ensembles. However, it is extremely verbose—most of the content is boilerplate class implementations that Claude can generate from concise patterns. The lack of progressive disclosure (no bundle files, everything inline) and absence of validation checkpoints in the workflow significantly weaken the skill's effectiveness as a context-window-efficient guide.
Suggestions
Extract the full class implementations (ModelRegistry, ModelRouter, FallbackChain, CostOptimizer, ModelEnsemble) into separate bundle files and replace them in SKILL.md with concise pattern descriptions and interface summaries.
Remove or drastically shorten docstrings, type hint explanations, and inline comments that Claude already understands—focus on the non-obvious design decisions and constraints.
Add explicit validation checkpoints to the workflow, e.g., 'Test fallback chain with simulated failures before deploying' and 'Verify routing rules cover all expected prompt categories'.
Move time-sensitive pricing data into a separate configuration file or note that prices should be verified at runtime, rather than hardcoding specific dollar amounts that will become stale.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. The ModelRegistry, ModelRouter, FallbackChain, CostOptimizer, and ModelEnsemble classes are fully spelled out with extensive docstrings, type hints, and inline comments that Claude already knows how to write. The pricing data is time-sensitive and will become stale. Much of this could be condensed to patterns and key interfaces rather than complete class implementations. | 1 / 3 |
Actionability | All code is fully executable Python with complete class definitions, Pydantic models, type annotations, and usage examples. The code is copy-paste ready with concrete model IDs, pricing, and working routing/fallback/cost logic. | 3 / 3 |
Workflow Clarity | The Auto-Apply section provides a 7-step sequence, but there are no validation checkpoints or feedback loops. For operations like model routing and fallback chains (which can fail silently or cascade), there's no guidance on verifying that routing rules work correctly or that fallback chains are tested before deployment. | 2 / 3 |
Progressive Disclosure | The entire skill is a monolithic wall of code with no bundle files to offload detail into. The complete class implementations for ModelRegistry, ModelRouter, FallbackChain, CostOptimizer, and ModelEnsemble should be in separate reference files, with SKILL.md providing only the patterns, key interfaces, and navigation links. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (713 lines); consider splitting into references/ and linking | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
2dfa65f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.