This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits.
49
36%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-compression/SKILL.mdQuality
Discovery
37%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is heavily lopsided — it provides excellent trigger terms and 'when to use' guidance but completely omits what the skill actually does. A reader knows exactly when to invoke it but has no idea what actions it performs, what outputs it produces, or how it accomplishes context compression. This is essentially half a description.
Suggestions
Add a 'what it does' clause listing specific actions, e.g., 'Generates structured summaries of conversation history, extracts key decisions and context, and produces compacted representations to reduce token usage in long-running sessions.'
Restructure to lead with capabilities before the 'Use when...' clause, following the pattern: '[What it does]. Use when [triggers].'
Mention concrete outputs or artifacts the skill produces (e.g., 'compacted context files', 'structured conversation summaries', 'priority-ranked context blocks') to clarify what the user will receive.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lacks concrete actions. It never explains what the skill actually does — there are no specific capabilities listed like 'generates structured summaries' or 'creates compacted context files'. It only describes when to use it, not what it does. | 1 / 3 |
Completeness | The description answers 'when' extensively but completely fails to answer 'what does this do'. There is no explanation of the skill's actual capabilities or outputs. The 'what' component is entirely missing, which is a critical gap. | 1 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms: 'compress context', 'summarize conversation history', 'implement compaction', 'reduce token usage', 'context compression', 'structured summarization', 'tokens-per-task optimization', 'long-running agent sessions exceeding context limits'. These are terms users would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | The trigger terms are fairly specific to context compression and token management, which helps distinguish it. However, without describing what the skill actually does, it could overlap with general summarization skills or conversation management tools. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill contains genuinely valuable domain knowledge about context compression strategies, but suffers significantly from verbosity — it reads more like a research summary than an actionable skill file. The lack of executable code and the repetition of key points (artifact trail weakness, tokens-per-task optimization) inflate the token cost without proportional benefit. The workflow sections would benefit from explicit validation checkpoints and the bulk of reference material should be moved to separate files.
Suggestions
Cut content by 50-60%: remove explanations of why things matter (Claude can infer this), eliminate repeated points about artifact trail weakness, and move evaluation dimensions/tables to a referenced file like references/evaluation-framework.md
Add executable code examples for the key operations: implement a concrete compression trigger function, a summary merge function, and a probe evaluation function rather than describing them in prose
Add explicit validation checkpoints to the iterative summarization workflow: after step 4 (merge), include a verification step that runs probes against the merged summary before discarding the original context
Move the detailed comparison tables, probe type definitions, and scoring dimensions to reference files, keeping only the decision matrix (which method to use when) in the main skill body
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | At ~350+ lines, this skill is extremely verbose. It explains concepts Claude already understands (what compression ratios mean, what ROUGE scores are, how compounding percentages work), includes extensive tables and evaluation frameworks that could be referenced externally, and repeats the same points across multiple sections (artifact trail weakness is mentioned at least 3 times). Much of the content reads like a research paper summary rather than actionable instructions. | 1 / 3 |
Actionability | The skill provides structured markdown templates and decision tables which are somewhat actionable, but lacks executable code. The 'step by step' implementation section for anchored iterative summarization is still abstract prose rather than concrete implementation. There are no actual code snippets for implementing compression triggers, probe evaluation, or artifact tracking — just descriptions of what to do. | 2 / 3 |
Workflow Clarity | The three-phase compression workflow and the anchored iterative summarization steps are sequenced, but validation checkpoints are largely missing. There's no explicit 'validate then proceed' pattern — for instance, the iterative summarization steps don't include verification that the merge preserved critical information before discarding the original. The probe-based evaluation is described as a concept but not integrated into the workflows as explicit checkpoints. | 2 / 3 |
Progressive Disclosure | The skill references external files (evaluation-framework.md, related skills) and has a References section with clear 'Read when' annotations, which is good. However, the main body is monolithic — the detailed evaluation dimensions, compression ratio tables, probe types, and examples are all inline when much of this could be split into reference documents. The content that should be in separate files (evaluation framework details, method comparison data) is embedded in the main skill. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
7a95d94
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.