This skill should be used when long-running agent sessions need context compression, structured summarization, compaction, token-per-task optimization, or durable handoff summaries that preserve decisions, files, risks, and next actions.
54
60%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-compression/SKILL.mdQuality
Discovery
57%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description targets a clear niche (context compression for long-running agent sessions) but relies heavily on technical jargon and fails to clearly articulate what concrete outputs or actions the skill produces. The 'when' and 'what' are blended together, making it harder for Claude to distinguish the skill's capabilities from its trigger conditions. It would benefit from separating concrete actions from usage triggers and using more natural language.
Suggestions
Separate the 'what' from the 'when': Start with concrete actions like 'Compresses long conversation histories into structured summaries preserving decisions, modified files, risks, and next actions.' Then follow with 'Use when...'
Add more natural trigger terms that users might actually say, such as 'summarize this conversation', 'reduce context length', 'session too long', 'running out of context window'
Replace jargon like 'token-per-task optimization' and 'durable handoff summaries' with plainer language, or at minimum supplement them with simpler synonyms
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (context compression, summarization) and some actions (compaction, token-per-task optimization, handoff summaries), but these are more like abstract categories than concrete, specific actions. It doesn't list discrete operations like 'generates a structured summary of decisions made' or 'compresses conversation history into key points'. | 2 / 3 |
Completeness | The description has a 'Use when' clause ('should be used when long-running agent sessions need...'), but the 'what does this do' part is weak—it lists trigger scenarios rather than clearly stating what the skill produces or does. The 'when' is present but blended with the 'what', making both less clear. | 2 / 3 |
Trigger Term Quality | Includes some relevant terms like 'context compression', 'summarization', 'compaction', 'handoff summaries', but many are technical jargon (e.g., 'token-per-task optimization', 'durable handoff summaries') rather than natural phrases a user would say. Missing simpler terms like 'summarize conversation', 'reduce context', 'session summary'. | 2 / 3 |
Distinctiveness Conflict Risk | The description targets a clear niche: long-running agent sessions needing context compression and handoff summaries. This is quite specific and unlikely to conflict with other skills, as the combination of 'agent sessions', 'context compression', and 'handoff summaries' is distinctive. | 3 / 3 |
Total | 9 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, domain-specific skill that covers context compression comprehensively with clear decision frameworks, concrete examples, and useful gotchas. Its main weaknesses are length (could be more concise by reducing redundancy between sections) and the lack of executable code implementations — the guidance is specific but stays at the instructional level rather than providing runnable scripts or function implementations. The workflow clarity is strong with good sequencing and validation through probe-based evaluation.
Suggestions
Reduce redundancy by consolidating the compression method descriptions — Core Concepts and the 'Select the Right Approach' section largely repeat the same information. Keep one authoritative description and reference it.
Add executable code examples for at least one key operation, such as a Python function implementing the anchored iterative summarization merge step or a probe evaluation script, to move from instructional to actionable.
Move the detailed evaluation dimensions, compression ratio tables, and gotchas into separate referenced files to improve progressive disclosure and reduce the monolithic feel of the main skill file.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is thorough and mostly avoids explaining things Claude already knows, but it's quite long (~400 lines) with some redundancy. The three compression approaches are described in Core Concepts and then re-described in the selection guide. The guidelines section largely restates points already made in detail above. Some trimming would improve token efficiency without losing information. | 2 / 3 |
Actionability | The skill provides structured markdown templates, decision tables, and a step-by-step workflow for anchored iterative summarization, which are concrete and useful. However, there is no executable code — no scripts, no function implementations, no tool invocations. The guidance is specific but remains at the instructional/conceptual level rather than providing copy-paste-ready implementations for compression triggers, probe evaluation, or artifact indexing. | 2 / 3 |
Workflow Clarity | Multi-step processes are clearly sequenced: the three-phase compression workflow has explicit phases with outputs, the anchored iterative summarization has a numbered step-by-step process, and compression triggers are presented in a decision table. The probe-based evaluation provides a feedback loop for validating compression quality. The gotchas section serves as validation checkpoints for common failure modes. | 3 / 3 |
Progressive Disclosure | The skill references one bundle file (./references/evaluation-framework.md) and several related skills, which is good navigation. However, no bundle files were provided, so the reference may be broken. More importantly, the skill itself is quite long and monolithic — the detailed topics, practical guidance, examples, gotchas, and integration sections could benefit from being split into separate referenced files rather than all being inline in a single document. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
25e1fa7
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.