Pause for review every N tasks - selective autonomy pattern
28
21%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./agent-skills/checkpoint-mode/SKILL.mdQuality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is too abstract and pattern-oriented to be effective for skill selection. It reads more like a design pattern label than a functional skill description, lacking concrete actions, natural trigger terms, and any 'use when' guidance.
Suggestions
Add a 'Use when...' clause specifying triggers, e.g., 'Use when the user wants Claude to check in periodically during multi-step or batch operations, or mentions terms like checkpoint, batch review, or confirm every N steps.'
Replace the abstract pattern name with concrete actions, e.g., 'Inserts review checkpoints during multi-step task execution, pausing after every N operations to request user confirmation before continuing.'
Include natural trigger terms users would say, such as 'checkpoint', 'batch approval', 'confirm before continuing', 'stop and check', 'review progress', or 'don't do everything at once'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description is vague — 'pause for review every N tasks' and 'selective autonomy pattern' are abstract concepts without concrete actions. It doesn't specify what actions the skill performs (e.g., counting tasks, prompting for confirmation, batching operations). | 1 / 3 |
Completeness | The description barely addresses 'what' (pause for review every N tasks) and completely lacks a 'when' clause. There is no explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | The terms 'selective autonomy pattern' are jargon unlikely to be used by a user naturally. 'Pause for review' is somewhat natural but lacks common variations like 'checkpoint', 'confirm before continuing', 'batch approval', or 'stop and ask'. | 1 / 3 |
Distinctiveness Conflict Risk | The concept of pausing for review every N tasks is somewhat distinctive as a workflow pattern, but the vague phrasing could overlap with other skills related to task management, automation control, or workflow orchestration. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is overly verbose, spending significant tokens on philosophy, motivational quotes, and explanations of concepts Claude already understands. While it provides a reasonable structure for a checkpoint-based autonomy pattern, the code examples are pseudocode referencing an undefined 'loki' system rather than executable instructions. The workflow lacks error recovery paths and the referenced bundle file doesn't exist.
Suggestions
Remove the Philosophy section, 'When to Use' comparison, and motivational quote — these explain concepts Claude already knows and waste ~40% of the token budget.
Make the agent instructions executable rather than pseudocode: define or reference the actual functions (generate_checkpoint_summary, load_completed_tasks) or replace with concrete shell/file operations Claude can perform.
Add explicit error recovery to the workflow: what should happen when the user rejects a checkpoint or requests course corrections, rather than only handling the happy path.
Either provide the referenced `references/production-patterns.md` bundle file or remove the reference to avoid dead links.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Significant verbosity: explains philosophy Claude already understands (why perpetual autonomy is bad), includes a motivational quote, explains when to use vs not use the pattern at length, and has sections like 'Metrics' and 'Comparison with Other Modes' that add bulk without actionable value. The 'Problem with Perpetual Autonomy' bullet points are obvious to Claude. | 1 / 3 |
Actionability | Provides some concrete guidance (config variables, Python pseudocode for checkpoint logic, signal file paths), but the code is illustrative pseudocode referencing a hypothetical 'loki' system rather than executable instructions. The checkpoint summary is a template but relies on undefined functions like `generate_checkpoint_summary()` and `load_completed_tasks()`. | 2 / 3 |
Workflow Clarity | The checkpoint workflow has a clear sequence (work → pause → summary → approval → resume) with a visual diagram, but lacks validation checkpoints. There's no error handling for what happens if approval is denied, if the user wants course corrections, or if the signal file is malformed. The feedback loop for 'course corrections' is mentioned in metrics but never described. | 2 / 3 |
Progressive Disclosure | References `references/production-patterns.md` but no bundle files exist to support it. The content is somewhat monolithic with sections that could be split out (metrics, comparison table, configuration). The structure has clear headings but everything is inline in one large file. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
0cf4f4d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.