CtrlK
BlogDocsLog inGet started
Tessl Logo

checkpoint-mode

Pause for review every N tasks - selective autonomy pattern

36

Quality

21%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./agent-skills/checkpoint-mode/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

7%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is too terse and abstract to be effective for skill selection. It reads more like a design pattern name than a functional description, lacking concrete actions, natural user trigger terms, and any explicit guidance on when Claude should select this skill. It would be very difficult for Claude to reliably choose this skill from a pool of available options.

Suggestions

Expand the description to list concrete actions, e.g., 'Pauses autonomous execution after every N completed tasks to present a summary and request user approval before continuing.'

Add a 'Use when...' clause with natural trigger terms like 'check in with me periodically', 'pause between tasks', 'don't do everything at once', 'ask before continuing', 'batch review'.

Replace jargon like 'selective autonomy pattern' with plain language describing the behavior, such as 'Controls how many tasks are completed before stopping for human review.'

DimensionReasoningScore

Specificity

The description is vague and abstract. 'Pause for review every N tasks' hints at a pattern but doesn't describe concrete actions like 'stops execution after N steps', 'prompts user for confirmation', or 'batches tasks into reviewable chunks'. 'Selective autonomy pattern' is jargon, not a concrete capability.

1 / 3

Completeness

The description partially implies 'what' (pausing for review) but is extremely thin on detail, and there is no 'when' clause or explicit trigger guidance whatsoever. Missing a 'Use when...' clause would cap this at 2, but the 'what' is also very weak, so it scores 1.

1 / 3

Trigger Term Quality

No natural user keywords are present. Users would not typically say 'selective autonomy pattern'. They might say 'check in with me', 'pause between tasks', 'ask before continuing', or 'batch review'. The description uses technical/design-pattern language rather than natural trigger terms.

1 / 3

Distinctiveness Conflict Risk

The concept of pausing for review every N tasks is somewhat niche and wouldn't directly conflict with most other skills. However, the vague phrasing could overlap with general workflow management, task orchestration, or human-in-the-loop skills.

2 / 3

Total

5

/

12

Passed

Implementation

35%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a design document or blog post than an actionable skill for Claude. It spends significant tokens on philosophy, motivation, and comparisons while the core implementation relies on pseudocode with undefined functions. The workflow is understandable but lacks error handling and validation steps needed for a robust checkpoint system.

Suggestions

Remove the Philosophy and 'When to Use' sections entirely—Claude can infer when checkpoint mode is appropriate from the configuration and workflow alone.

Replace the illustrative Python pseudocode with either fully executable code or concrete step-by-step instructions Claude can directly follow (e.g., actual file I/O for signal creation and detection).

Add error recovery steps: what to do if a checkpoint summary fails to write, if the approval signal is never received (timeout), or if tasks fail between checkpoints.

Move the comparison table, metrics schema, and references into a separate file, keeping SKILL.md focused on the actionable workflow and configuration.

DimensionReasoningScore

Conciseness

Significant verbosity: the Philosophy section explains problems Claude already understands, the 'When to Use' section is largely common sense, and the comparison table and metrics sections add bulk without actionable value. The quote attribution and motivational framing waste tokens.

1 / 3

Actionability

Provides some concrete artifacts (config variables, Python pseudocode, signal file paths, summary template), but the Python code is illustrative pseudocode referencing undefined functions (load_completed_tasks, generate_checkpoint_summary, write_signal, etc.) rather than executable code. The skill describes a pattern more than it instructs on implementation.

2 / 3

Workflow Clarity

The checkpoint workflow is sequenced (generate summary → create signal → wait for approval → resume), but there are no validation checkpoints or error recovery steps. What happens if the approval signal is malformed? What if tasks fail mid-checkpoint? No feedback loop for error cases.

2 / 3

Progressive Disclosure

References to external files exist (references/production-patterns.md, external URL), but the SKILL.md itself is monolithic with sections like Philosophy, When to Use, Comparison, and Metrics that could be separated. The content that should be inline (actual implementation) vs. separated (rationale, comparisons) is inverted.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
asklokesh/loki-mode
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.