Audits Claude Code context window consumption across agents, skills, MCP servers, and rules. Identifies bloat, redundant components, and produces prioritized token-savings recommendations.
71
71%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is specific and distinctive, clearly articulating what the skill does and covering a well-defined niche. Its main weakness is the absence of an explicit 'Use when...' clause, which would help Claude know exactly when to select this skill. The trigger terms are somewhat technical and could benefit from including more natural user phrasings.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about context window usage, token consumption, context optimization, or wants to reduce token bloat in their Claude Code setup.'
Include more natural trigger term variations such as 'token usage', 'running out of context', 'context too large', 'optimize tokens', or 'reduce context size' to improve matching with how users naturally phrase these requests.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'audits context window consumption', 'identifies bloat, redundant components', and 'produces prioritized token-savings recommendations'. Also specifies the domains it operates across: agents, skills, MCP servers, and rules. | 3 / 3 |
Completeness | Clearly answers 'what does this do' (audits context window consumption, identifies bloat, produces recommendations), but lacks an explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric caps completeness at 2. | 2 / 3 |
Trigger Term Quality | Includes some relevant terms like 'context window', 'token-savings', 'bloat', and 'MCP servers', but misses common natural language variations users might say such as 'token usage', 'context length', 'too many tokens', 'running out of context', or 'optimize context'. The terms lean somewhat technical. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche — auditing Claude Code context window consumption is a very specific task unlikely to overlap with other skills. The combination of 'context window', 'token-savings', and the specific scope (agents, skills, MCP servers, rules) makes it clearly distinguishable. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a well-structured conceptual framework for auditing context window consumption with clear phases, specific thresholds, and a useful output format. Its main weakness is the lack of executable code or concrete commands—it describes what to do at a high level but doesn't provide the actual implementation to scan directories, count tokens, or generate reports. The workflow clarity is strong but actionability suffers from being descriptive rather than prescriptive.
Suggestions
Add executable code snippets for the core operations: scanning directories, counting tokens (e.g., a shell one-liner or Python snippet for `words × 1.3`), and parsing .mcp.json for tool counts.
Remove the 'When to Use' section or reduce it to 1-2 lines—Claude can infer when to audit context budgets from the skill description alone.
Move the verbose mode report format and per-file breakdown details to a separate reference file to keep the main SKILL.md focused on the core workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes some unnecessary verbosity. The 'When to Use' section has 5 bullet points that could be trimmed, and some explanations (like what MCP tool schemas cost) are repeated in both the main body and Best Practices. The report template is useful but adds length. Overall mostly efficient with some tightening possible. | 2 / 3 |
Actionability | The skill describes a clear process and provides specific thresholds (>200 lines, >30 words, ~500 tokens per tool, words × 1.3), but lacks executable code. There are no actual scripts or commands to run the inventory—it's a conceptual workflow rather than copy-paste-ready implementation. The examples show expected output format but not how to actually compute the values. | 2 / 3 |
Workflow Clarity | The four-phase workflow (Inventory → Classify → Detect Issues → Report) is clearly sequenced with explicit criteria at each step. The classification table provides clear decision criteria, the detection phase lists specific problem patterns with thresholds, and the report phase shows exact output format. For an analytical/audit skill, this is well-structured with clear checkpoints. | 3 / 3 |
Progressive Disclosure | The content is well-organized with clear sections and a logical flow, but it's all inline in a single file with no references to external files for detailed content. The verbose mode output format and the full report template could be split out. For a skill of this length (~120 lines), it's borderline acceptable but the report template and detailed examples add bulk that could be referenced. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents