Monitors context window health throughout a session and rides peak context quality for maximum output fidelity. Activates automatically after plan-interview and intent-framed-agent. Stays active through execution and hands off cleanly to simplify-and-harden and self-improvement when the wave completes naturally or exits via handoff. Use this skill whenever a multi-step agent task is underway and session continuity or context drift is a concern. Especially important for long-running tasks, complex refactors, or any work where degraded context would silently corrupt the output. Trigger even if the user doesn't say "context surfing" — if an agent task is running across multiple steps with intent and a plan already established, this skill is live.
52
57%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/context-surfing/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at completeness by clearly stating both what the skill does and when to use it, with explicit trigger guidance. However, it suffers from vague, metaphorical language ('rides peak context quality', 'wave completes naturally') that obscures the concrete actions performed, and its broad activation criteria ('any multi-step agent task') create significant overlap risk with other skills. The internal skill-chain references (plan-interview, intent-framed-agent, simplify-and-harden) add orchestration context but aren't terms users would naturally use.
Suggestions
Replace metaphorical language ('rides peak context quality', 'wave completes naturally') with concrete actions the skill performs, e.g., 'tracks token usage, summarizes completed steps, prunes stale context, checkpoints progress'.
Narrow the activation triggers to reduce conflict risk — instead of 'any multi-step agent task', specify measurable conditions like 'tasks exceeding N steps' or 'when context usage exceeds 50% of the window'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a domain (context window health monitoring) and some actions (monitors context, rides peak context quality, hands off to other skills), but the concrete actions are vague and metaphorical ('rides peak context quality', 'wave completes naturally'). It doesn't list specific, tangible operations. | 2 / 3 |
Completeness | The description clearly answers both 'what' (monitors context window health, manages context quality during execution) and 'when' (multi-step agent tasks, long-running tasks, complex refactors, when context drift is a concern, after plan-interview and intent-framed-agent are active). It has explicit trigger guidance including the 'Use this skill whenever...' and 'Trigger even if...' clauses. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'multi-step agent task', 'context drift', 'long-running tasks', 'complex refactors', and 'session continuity', but relies heavily on internal jargon ('plan-interview', 'intent-framed-agent', 'simplify-and-harden') that users would never naturally say. The explicit note about not needing 'context surfing' is helpful but the natural trigger coverage is incomplete. | 2 / 3 |
Distinctiveness Conflict Risk | The concept of 'context window health monitoring' is somewhat niche, but the broad triggers ('any multi-step agent task is underway') could cause this skill to activate too aggressively and overlap with many other execution-phase skills. The description positions it as almost always-on during agent tasks, which increases conflict risk. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill demonstrates strong workflow design with clear sequencing, validation checkpoints, and error recovery paths, but is severely undermined by verbosity. The surfing metaphor, while creative, consumes significant tokens explaining concepts Claude can infer. The actionability is moderate — there are some concrete artifacts but the core monitoring behavior is described in prose rather than executable templates or checklists.
Suggestions
Cut the Mental Model section entirely or reduce to 2 lines — the metaphor doesn't add actionable value and Claude doesn't need it explained at length
Replace the prose-heavy drift detection and recovery sections with a compact decision tree or checklist format (e.g., 'Strong signal → exit immediately; Weak signal → Step 1: re-read anchor cold, Step 2: spawn context-monitor if unresolved, Step 3: escalate or exit')
Extract the handoff file template, hook configuration, and interoperability matrix into separate reference files and link to them from the main skill
Remove explanatory justifications throughout (e.g., 'This is not failure. This is the system working correctly.' and 'The weak signals are more reliably self-detectable precisely because...') — these explain reasoning Claude doesn't need
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | This skill is extremely verbose at ~400+ lines. It extensively explains metaphorical concepts (ocean waves, surfing) that don't add actionable value, over-explains the monitoring paradox, and includes lengthy philosophical justifications. Much of the content describes concepts Claude can infer (what drift is, why clean exits matter) rather than providing lean instructions. The mental model section, principles section, and extensive prose throughout could be cut by 60%+ without losing actionable content. | 1 / 3 |
Actionability | The skill provides some concrete artifacts (handoff file template, hook configuration JSON, bash commands for Entire CLI), but the core guidance is largely procedural prose rather than executable steps. The drift detection signals are well-enumerated but the pre-commit anchor check is described abstractly rather than as a concrete template. Much of the skill describes behavioral patterns rather than providing copy-paste-ready implementations. | 2 / 3 |
Workflow Clarity | The multi-step workflows are clearly sequenced with explicit validation checkpoints. The Recovery Protocol has clear step-by-step branching (pause → re-read → reconcile → escalate or exit). The Exit Protocol has numbered steps with explicit validation (stop → write handoff → notify). The drift detection has clear strong/weak signal categorization with different response paths. Feedback loops are present (re-anchor → check → resume or escalate). | 3 / 3 |
Progressive Disclosure | The content is structured with clear headers and sections, but it's essentially a monolithic document with no references to external files for detailed content. The handoff template, hook configuration, and interoperability details could be split into separate reference files. The pipeline table and relationship explanations are inline when they could be referenced. However, the sections are well-organized and navigable within the single file. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
fe0da1c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.