checkpoint-mode

Pause for review every N tasks - selective autonomy pattern

Quality

50%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Fix and improve this skill with Tessl

tessl review fix ./agent-skills/checkpoint-mode/SKILL.md

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is well-organized and provides concrete configuration plus an illustrative workflow, but the executable code relies on undefined helpers, the approval-wait loop lacks error-recovery validation, and the sole internal reference is a broken path. Time-sensitive dates and restated source commentary also reduce token efficiency.

Suggestions

Resolve the broken reference: create references/production-patterns.md or remove the citation so navigation is not misleading.

Make the orchestrator code executable by defining or stubbing the helpers (load_completed_tasks, signal_exists, write_signal) and the CHECKPOINT_FREQUENCY constant, or explicitly label it as illustrative pseudocode.

Add a timeout or error-recovery path to the approval-wait loop (e.g., max wait, escalate, or auto-summarize) so the batch pause has a proper feedback loop, and move embedded dates/version info into a clearly marked section.

Dimension	Reasoning	Score
Conciseness	The body is mostly efficient with tight bullets and concrete config, but it carries unnecessary explanation (the philosophy quote and 'Problem with Perpetual Autonomy' list) plus embedded time-sensitive dates ('2026-01-14', 'Tim Dettmers, 2026') not placed in a deprecated section, fitting 'mostly efficient but could be tightened' rather than a lean score 3 or a verbose score 1.	2 / 3
Actionability	Concrete config (LOKI_AUTONOMY_MODE=checkpoint) and a Python block are provided, but the code calls undefined helpers (load_completed_tasks, signal_exists, write_signal, CHECKPOINT_FREQUENCY) and the workflow diagram is ASCII pseudocode, so it is 'some concrete guidance but incomplete; pseudocode rather than executable code'.	2 / 3
Workflow Clarity	The checkpoint sequence (Work -> Pause -> Summary -> Wait for Approval -> Resume) is clearly laid out with a numbered procedure and a polling loop, but the batch pause operation has a validation gap: the 'while not signal_exists' loop has no timeout or error-recovery feedback loop, so it caps at 2 rather than 3.	2 / 3
Progressive Disclosure	The skill is well-sectioned with a References section, but the one cited internal reference 'references/production-patterns.md' does not exist (the references/ directory is absent), and inline content like philosophy quotes and comparison tables is not split out, fitting 'some structure but references not clearly signaled / content that should be separate is inline'.	2 / 3
	Total	8 / 12 Passed

Description

50%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description conveys a clear core action (pause for review every N tasks) and a distinct domain, but it lacks explicit 'Use when' trigger guidance and a fuller enumeration of concrete capabilities. It reads as adequate but not comprehensive, scoring at the midpoint across all dimensions.

Suggestions

Add an explicit 'Use when...' clause naming concrete triggers (e.g., 'Use when running long autonomous task batches, expensive API runs, or novel projects needing course correction').

Expand the capability list from one action to several concrete ones (e.g., pauses execution, generates checkpoint summaries, writes approval signals, resumes on approval).

Surface natural user-facing terms like 'checkpoint', 'approval', and 'feedback loop' alongside 'pause for review' to improve trigger-term coverage.

Dimension	Reasoning	Score
Specificity	The phrase 'Pause for review every N tasks' names a concrete action and a domain ('selective autonomy pattern'), but it does not list multiple specific concrete actions, matching the 'names domain and some actions, but not comprehensive' anchor rather than a multi-action score 3 or a vague score 1.	2 / 3
Completeness	It states what the skill does ('Pause for review every N tasks') but provides no 'Use when...' clause or explicit trigger for when Claude should invoke it, so per the judging guideline a missing trigger clause caps completeness at 2 rather than 3.	2 / 3
Trigger Term Quality	Terms like 'pause', 'review', and 'tasks' are natural, but the description omits common variations a user would actually say ('checkpoint', 'approval', 'feedback loop') and leans on the jargony 'selective autonomy', fitting the 'some relevant keywords but missing common variations' anchor.	2 / 3
Distinctiveness Conflict Risk	'selective autonomy pattern' gives a somewhat distinct niche, but without explicit distinct trigger keywords it could still overlap with general review skills, matching 'somewhat specific but could still overlap' rather than a clearly distinct score 3 or a generic score 1.	2 / 3
	Total	8 / 12 Passed

Validation

87%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 14 / 16 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
referenced_paths_exist	Referenced path issues: 1 missing	Warning

	Total	14 / 16 Passed

Repository: asklokesh/loki-mode
Path: agent-skills/checkpoint-mode/SKILL.md
Commit: 08a09b1

Reviewed: about 15 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.