Monitors context window health throughout a session and rides peak context quality for maximum output fidelity. Activates automatically after plan-interview and intent-framed-agent. Stays active through execution and hands off cleanly to simplify-and-harden and self-improvement when the wave completes naturally or exits via handoff. Use this skill whenever a multi-step agent task is underway and session continuity or context drift is a concern. Especially important for long-running tasks, complex refactors, or any work where degraded context would silently corrupt the output. Trigger even if the user doesn't say "context surfing" — if an agent task is running across multiple steps with intent and a plan already established, this skill is live.
65
57%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-surfing/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is thorough in explaining when to activate and includes explicit trigger guidance, which is a strength. However, it relies heavily on internal pipeline jargon and buzzwords ('rides peak context quality', 'maximum output fidelity', 'wave completes naturally') that obscure what the skill concretely does. The overly broad activation criteria risk making it fire too frequently and conflict with other skills.
Suggestions
Replace abstract phrases like 'rides peak context quality for maximum output fidelity' with concrete actions — e.g., 'Tracks token usage, summarizes completed steps, prunes stale context, and signals when to checkpoint or hand off work.'
Reduce internal pipeline jargon (plan-interview, intent-framed-agent, simplify-and-harden) or explain them briefly, and add more natural user-facing trigger terms like 'long session', 'running out of context', 'session getting too long'.
Narrow the activation criteria to reduce conflict risk — instead of 'any multi-step agent task', specify the conditions that distinguish this from general task execution skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a domain (context window health monitoring) and some actions (monitors context, rides peak context quality, hands off to other skills), but the actual concrete actions are vague — 'rides peak context quality' and 'maximum output fidelity' are abstract/buzzwordy rather than specific actionable capabilities. | 2 / 3 |
Completeness | The description clearly answers both 'what' (monitors context window health, manages context quality during execution) and 'when' (multi-step agent tasks, long-running tasks, complex refactors, when context drift is a concern), with explicit trigger guidance including the note about activating even without the user saying 'context surfing'. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'multi-step agent task', 'context drift', 'long-running tasks', 'complex refactors', and 'session continuity', but many are internal/technical jargon (e.g., 'plan-interview', 'intent-framed-agent', 'simplify-and-harden') rather than natural user language. A user would more likely say 'my session is getting long' or 'context is getting stale' than these skill-pipeline terms. | 2 / 3 |
Distinctiveness Conflict Risk | The skill occupies a somewhat unique niche around context window management, but the broad triggers ('any multi-step agent task is underway') could cause it to activate for nearly every complex task, potentially conflicting with other execution-phase skills. The description's scope is quite wide. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent workflow clarity with well-defined protocols for drift detection, recovery, and exit, including a thorough handoff template. However, it is severely over-long and verbose — the ocean wave metaphor is belabored, concepts are repeated across sections, and philosophical justifications (e.g., 'The Monitoring Paradox') explain things Claude already understands. The core actionable content could be delivered in roughly 1/3 the tokens.
Suggestions
Cut the content by 60-70%: remove the extended wave metaphor explanations, 'The Monitoring Paradox' philosophical section, the 'Not drift' section (Claude knows what normal iteration is), and the repeated reassurances that 'exit is not failure'. Keep the signal lists, protocols, and templates.
Move the detailed handoff template, drift signal lists, and interoperability matrix into separate referenced files (e.g., HANDOFF-TEMPLATE.md, DRIFT-SIGNALS.md) to improve progressive disclosure and reduce the main file to an actionable overview.
Remove explanations of why things matter (e.g., 'A shorter session with high fidelity beats a long session with gradual corruption') — Claude doesn't need motivational framing, just the rules and procedures.
Consolidate the repeated wave anchor descriptions — the concept of 'read from artifacts, not memory' is stated at least 4 times across different sections.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~400+ lines. It over-explains metaphors (ocean wave analogy), includes extensive philosophical justification ('The Monitoring Paradox'), explains concepts Claude already understands (what context degradation is, why clean exits matter), and repeats the same ideas multiple times across sections. The principles section restates what was already said. Much of this could be cut to 1/3 the length. | 1 / 3 |
Actionability | The handoff file template is concrete and copy-paste ready, and the hook integration has executable config/commands. However, the core skill — drift detection and recovery — is largely behavioral guidance rather than executable steps. The 'entire status' and 'ls .context-surfing/' commands are concrete but minor. Most of the skill describes how to think rather than what to do. | 2 / 3 |
Workflow Clarity | The multi-step processes are clearly sequenced with explicit validation checkpoints. The Recovery Protocol has clear steps (pause → re-read → reconcile → escalate or exit). The Exit Protocol has numbered steps with a detailed handoff template. Strong vs weak signals are clearly differentiated with different response paths. Feedback loops are present (re-anchor → check → resume or exit). | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear headers and sections, but it's essentially a monolithic document. The pipeline table and interoperability section are good organizational elements, but the sheer volume of inline content (drift detection details, recovery protocol, monitoring paradox discussion) could be split into referenced files. No external references are used despite the length warranting them. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
d6c68fa
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.