This skill should be used when the user asks to "diagnose context problems", "fix lost-in-middle issues", "debug agent failures", "understand context poisoning", or mentions context degradation, attention patterns, context clash, context confusion, or agent performance degradation. Provides patterns for recognizing and mitigating context failures.
70
62%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-degradation/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at trigger term coverage and completeness, with a clear 'when to use' clause and highly specific domain terminology. Its main weakness is that the 'what it does' portion is somewhat vague — 'provides patterns for recognizing and mitigating' doesn't convey concrete actions or deliverables. The niche is well-defined and distinctive.
Suggestions
Replace 'Provides patterns for recognizing and mitigating context failures' with more concrete actions, e.g., 'Diagnoses context window issues, identifies attention degradation patterns, recommends context restructuring strategies, and fixes lost-in-middle problems in agent pipelines.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions 'recognizing and mitigating context failures' and lists several problem types (lost-in-middle issues, context poisoning, agent failures), but the actual actions are vague — 'provides patterns' is not a concrete action like 'extracts', 'generates', or 'rewrites'. It names the domain well but lacks specific deliverables. | 2 / 3 |
Completeness | The description explicitly answers both 'what' (provides patterns for recognizing and mitigating context failures) and 'when' (with a clear 'Use when' equivalent listing specific trigger phrases and concepts). The 'when' guidance is thorough. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms: 'diagnose context problems', 'fix lost-in-middle issues', 'debug agent failures', 'context poisoning', 'context degradation', 'attention patterns', 'context clash', 'context confusion', 'agent performance degradation'. These are terms users working with LLM agents would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | The skill occupies a very specific niche — context window debugging and agent context failures. The trigger terms are highly specialized (lost-in-middle, context poisoning, attention patterns) and unlikely to conflict with general coding, document, or other common skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill covers context degradation patterns comprehensively but suffers from significant verbosity — it reads more like a research survey or tutorial than a concise operational skill for Claude. Key concepts are repeated across Core Concepts, Detailed Topics, and Practical Guidance sections. The content would benefit greatly from aggressive trimming, moving detailed explanations behind references, and replacing prose descriptions with concrete diagnostic procedures and executable examples.
Suggestions
Cut content by 50-60%: remove the Core Concepts section entirely (it duplicates Detailed Topics), eliminate explanations of well-known concepts like attention mechanics, and consolidate repeated points about the U-curve and non-linear degradation.
Replace the illustrative YAML/markdown examples with actionable diagnostic procedures — e.g., a concrete checklist: 'Run the same prompt at 2K tokens. If it fails → prompt problem. If it succeeds → measure at 8K, 16K, 32K to find the cliff edge.'
Move Detailed Topics subsections (each degradation pattern's full explanation), Empirical Benchmarks, and Counterintuitive Findings into separate reference files, keeping only a 2-3 line summary of each pattern in the main skill with links.
Add explicit diagnostic workflow with validation steps: 'Step 1: Establish baseline at low context → Step 2: Identify which pattern matches symptoms → Step 3: Apply specific mitigation → Step 4: Verify improvement by re-running baseline comparison.'
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~2000+ lines of content. It explains concepts Claude already understands (what attention is, how context windows work, what RAG is), includes extensive prose explanations where bullet points would suffice, and repeats key points multiple times across sections (e.g., the U-curve and lost-in-middle phenomenon is explained at least 3 times). The 'Core Concepts' section alone restates what the detailed sections cover. | 1 / 3 |
Actionability | The skill provides conceptual frameworks and heuristics (the four-bucket framework, placement strategies) that are somewhat actionable, but lacks executable code or concrete commands. The examples are illustrative YAML/markdown rather than copy-paste-ready implementations. Detection signals and mitigation strategies are described in prose rather than as specific, implementable procedures. | 2 / 3 |
Workflow Clarity | The four-bucket mitigation framework provides a reasonable decision structure, and the guidelines section lists steps. However, there are no explicit validation checkpoints or feedback loops for the diagnostic process itself — no 'if you see X, do Y, then verify Z' sequences. The diagnostic workflow is implicit rather than explicitly sequenced with verification steps. | 2 / 3 |
Progressive Disclosure | The skill references external files (./references/patterns.md) and related skills with 'Read when' annotations, which is good. However, the main file itself is monolithic — the detailed subsections on each pattern (Lost-in-Middle, Poisoning, Distraction, Confusion, Clash) could be split into separate reference files. The 'Empirical Benchmarks', 'Counterintuitive Findings', and 'When Larger Contexts Hurt' sections add significant bulk that could be referenced rather than inlined. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
7a95d94
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.