Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
57
72%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is strong in specificity and distinctiveness, clearly enumerating concrete cost optimization patterns for LLM APIs. Its main weakness is the absence of an explicit 'Use when...' clause, which would help Claude know exactly when to select this skill. The trigger terms are somewhat technical and could benefit from more user-facing language variations.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about reducing LLM API costs, saving money on API calls, or optimizing token usage.'
Include more natural user-facing trigger terms such as 'save money', 'reduce API costs', 'token usage', 'API spending', or specific provider names like 'OpenAI', 'Anthropic API'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: model routing by task complexity, budget tracking, retry logic, and prompt caching. These are distinct, well-defined capabilities. | 3 / 3 |
Completeness | Clearly answers 'what does this do' with specific capabilities, but lacks an explicit 'Use when...' clause or equivalent trigger guidance, which caps this dimension at 2 per the rubric. | 2 / 3 |
Trigger Term Quality | Includes relevant terms like 'cost optimization', 'LLM API', 'model routing', 'budget tracking', 'retry logic', 'prompt caching', but misses common user-facing variations like 'save money', 'reduce costs', 'API spending', 'token usage', 'cheaper', or specific provider names. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'cost optimization' specifically for 'LLM API usage' with the enumerated patterns (model routing, budget tracking, retry logic, prompt caching) creates a clear niche that is unlikely to conflict with other skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable skill with executable code examples covering four composable cost-optimization patterns. Its main weaknesses are moderate redundancy (overlapping sections for when-to-use and anti-patterns vs best practices) and missing validation/feedback loops for batch processing scenarios where partial failures or budget overruns need recovery. The content would benefit from trimming duplicate guidance and adding explicit checkpoints for error recovery in multi-item pipelines.
Suggestions
Merge 'When to Activate' and 'When to Use' into a single section, and consolidate 'Anti-Patterns' into 'Best Practices' as brief 'avoid' notes to reduce redundancy.
Add explicit validation/feedback loop guidance in the Composition section: e.g., what to do when budget is exceeded mid-batch (save progress, report partial results), and how to verify cost tracking accuracy after a run.
Consider extracting the pricing table and detailed code implementations into separate referenced files to keep the main SKILL.md as a concise overview with pointers to detail.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient with good code examples, but includes some redundancy: 'When to Activate' and 'When to Use' sections overlap significantly, 'Anti-Patterns to Avoid' largely mirrors 'Best Practices' in negated form, and some explanatory text (e.g., 'Track cumulative spend with frozen dataclasses') tells Claude things it can infer from the code. | 2 / 3 |
Actionability | All four core patterns include fully executable Python code with concrete implementations. The composition section shows how to wire them together in a real pipeline function, and the pricing table provides specific numbers needed for cost calculations. | 3 / 3 |
Workflow Clarity | The composition section shows a clear 4-step sequence for the pipeline, but lacks explicit validation checkpoints — there's no verification that the budget check is accurate before proceeding, no guidance on what to do when a batch partially fails mid-budget, and no feedback loop for tuning model routing thresholds based on results. | 2 / 3 |
Progressive Disclosure | The content is well-structured with clear headers and logical sections, but everything is inline in a single file. The pricing table, anti-patterns, and detailed code examples could be split into referenced files to keep the main skill leaner. No bundle files exist to offload detail. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents