Monitors context window health throughout a session and rides peak context quality for maximum output fidelity. Activates automatically after plan-interview and intent-framed-agent. Stays active through execution and hands off cleanly to simplify-and-harden and self-improvement when the wave completes naturally or exits via handoff. Use this skill whenever a multi-step agent task is underway and session continuity or context drift is a concern. Especially important for long-running tasks, complex refactors, or any work where degraded context would silently corrupt the output. Trigger even if the user doesn't say "context surfing" — if an agent task is running across multiple steps with intent and a plan already established, this skill is live.
47
48%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-surfing/SKILL.mdQuality
Discovery
57%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description attempts to be comprehensive with explicit 'when to use' guidance, which is its strongest aspect. However, it suffers from vague, metaphorical language ('rides peak context quality', 'wave completes naturally') that obscures what the skill actually does in concrete terms. The internal jargon and abstract framing make it difficult to understand the specific actions performed and could cause confusion in a multi-skill selection scenario.
Suggestions
Replace metaphorical language with concrete actions — specify exactly what 'monitoring context window health' entails (e.g., 'tracks token usage, summarizes prior context, checkpoints key decisions, prunes irrelevant history').
Remove or explain internal jargon like 'rides peak context quality' and 'wave completes naturally' — these are not meaningful to skill selection and add confusion.
Clarify the boundary with related skills (plan-interview, intent-framed-agent, simplify-and-harden) more precisely to reduce overlap risk — state what this skill does that those do not.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses highly abstract, metaphorical language ('rides peak context quality', 'wave completes naturally') without listing concrete, actionable capabilities. There are no specific actions like 'summarizes context', 'checkpoints state', or 'prunes irrelevant history' — just vague concepts about monitoring and quality. | 1 / 3 |
Completeness | The description does explicitly address both 'what' (monitors context window health, maintains output fidelity) and 'when' (multi-step agent tasks, long-running tasks, complex refactors, when context drift is a concern). It includes a clear 'Use this skill whenever...' clause with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | It includes some relevant terms like 'multi-step agent task', 'long-running tasks', 'complex refactors', 'context drift', and 'session continuity', which users might naturally reference. However, much of the language is internal jargon ('context surfing', 'intent-framed-agent', 'simplify-and-harden') that users would not say. | 2 / 3 |
Distinctiveness Conflict Risk | The skill occupies a somewhat unique niche around context management, but its vague framing ('monitors context window health') and broad activation criteria ('any multi-step agent task is underway') could easily overlap with general session management, planning, or orchestration skills. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
39%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill demonstrates strong workflow design with clear sequencing, validation checkpoints, and recovery protocols, but is severely undermined by verbosity. The ocean wave metaphor, while creative, consumes significant tokens explaining concepts Claude already understands. The core drift detection mechanism relies heavily on abstract behavioral descriptions rather than concrete, executable examples, making the most critical part of the skill the least actionable.
Suggestions
Cut the Mental Model section entirely or reduce to 2 lines — the metaphor adds flavor but not actionability, and Claude doesn't need analogies to understand context degradation.
Add a concrete example of a pre-commit anchor check: show an actual reasoning trace with a quoted anchor line, a pending action, and the traceability confirmation, so Claude knows exactly what this looks like in practice.
Split into multiple files: keep SKILL.md as a concise overview (<100 lines) with the activation rules and drift signal list, then move the Recovery Protocol, Exit Protocol/handoff template, and Interoperability details into separate referenced files.
Remove explanatory prose like 'This is not failure. This is the system working correctly.' and the Principles section — these restate what the protocol already implies and waste tokens.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~400+ lines. The ocean wave metaphor is extensively elaborated when a simple table would suffice. Concepts like 'what drift is' and 'why clean exits matter' are explained at length — Claude already understands these. The principles section restates what was already said. Significant portions (mental model, lifecycle explanations, interoperability details) could be cut by 60%+ without losing actionable content. | 1 / 3 |
Actionability | The handoff file template is concrete and copy-paste ready, and the hook integration provides executable config. However, the core skill — drift detection and recovery — is described in abstract behavioral terms ('monitor for hedging language', 'run the pre-commit anchor check in your reasoning chain') rather than executable steps with concrete examples. The pre-commit anchor check is described but no example of what it looks like in practice is given. The drift signals are listed but there's no concrete example of detecting one. | 2 / 3 |
Workflow Clarity | The multi-step workflows are clearly sequenced with explicit validation checkpoints. The Recovery Protocol has a clear step-by-step flow with branching logic (re-read → reconcile → subagent check → escalate → exit). The Exit Protocol has numbered steps with explicit ordering. The precedence rule between intent-framed-agent and context-surfing is clearly stated. Feedback loops are present (re-anchor → check → resume or exit). | 3 / 3 |
Progressive Disclosure | Everything is in one monolithic file with no references to supporting files. The handoff-checker.sh script is referenced but not provided in the bundle. The content could benefit enormously from splitting — the drift detection signals, the handoff template, the interoperability matrix, and the hook setup could all be separate files. The task complexity table and mental model add bulk to what should be a focused execution guide. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f6c5d7b
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.