Create, review, and validate an alignment checkpoint. Use when a request is ambiguous, high-stakes, multi-step, or requires explicit approval before tool use.
48
51%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./Skills/agent-ops/alignment-checkpoint/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has a clear structure with both 'what' and 'when' components, which is a strength. However, the core concept of an 'alignment checkpoint' is not well-defined in concrete terms, and the trigger conditions are broad enough to potentially conflict with many other skills. The description would benefit from more specific actions and narrower, more distinctive trigger terms.
Suggestions
Define what an 'alignment checkpoint' concretely involves—e.g., 'Pauses execution to summarize the plan, confirm assumptions, and get user approval before proceeding with destructive or irreversible actions.'
Narrow the trigger conditions to reduce conflict risk—instead of broad terms like 'ambiguous' or 'multi-step', specify scenarios like 'before deleting files, running database migrations, or executing commands with side effects.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain ('alignment checkpoint') and lists some actions ('create, review, and validate'), but the actions are somewhat abstract—it's unclear what concrete steps are involved in creating or validating an alignment checkpoint. | 2 / 3 |
Completeness | Clearly answers both 'what' (create, review, and validate an alignment checkpoint) and 'when' (ambiguous, high-stakes, multi-step requests, or those requiring explicit approval before tool use) with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes some relevant trigger terms like 'ambiguous', 'high-stakes', 'multi-step', 'approval', and 'tool use', but these are more situational descriptors than natural keywords a user would say. Users are unlikely to say 'alignment checkpoint' directly. | 2 / 3 |
Distinctiveness Conflict Risk | The concept of an 'alignment checkpoint' is somewhat niche, but the trigger conditions ('ambiguous', 'high-stakes', 'multi-step') are very broad and could overlap with many other skills that also handle complex or sensitive requests. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like an abstract process philosophy document than an actionable skill. While it has reasonable structure and appropriate validation/fail-fast principles, it lacks the concrete artifacts Claude would need to actually execute an alignment checkpoint—no template for the checkpoint output, no example of a completed checkpoint, and no schema for the schema-bound outputs it mentions. The workflow steps are too abstract to be directly executable.
Suggestions
Add a concrete checkpoint template or structured output example showing exactly what an alignment checkpoint looks like (goal extraction, assumptions, success criteria, options, approval gate) with sample content.
Define what 'schema-bound outputs' and 'schema_version' mean by either inlining a brief JSON schema or linking to a schema file in the bundle.
Replace abstract workflow steps like 'Inspect 2-3 focused surfaces' with specific, actionable instructions (e.g., 'Read the user's message, identify the primary goal, list 1-3 unstated assumptions, and present options before proceeding').
Add a complete worked example showing a user request, the checkpoint Claude would produce, and the approval/proceed flow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably concise but includes some sections that are somewhat abstract and could be tightened. The Philosophy, Anti-Patterns, and some Workflow steps are general principles Claude already understands rather than novel, task-specific instructions. However, it avoids egregious verbosity. | 2 / 3 |
Actionability | The content is almost entirely abstract guidance with no concrete commands, code, executable examples, or specific templates. Phrases like 'Inspect 2-3 focused surfaces' and 'Take the smallest action that advances the confirmed goal' are vague directives rather than actionable instructions. The examples section shows quoted user messages but not what Claude should actually produce (e.g., no checkpoint template, no structured output format). | 1 / 3 |
Workflow Clarity | There is a numbered workflow with validation steps and a fail-fast gate, which is good. However, the steps are abstract ('Classify the requested mode,' 'Inspect 2-3 focused surfaces') without concrete definitions of what these mean in practice. The validation section mentions stopping at failed gates but doesn't specify what a gate looks like or how to verify one passed. | 2 / 3 |
Progressive Disclosure | The skill references a deferred context directory under Infrastructure/references/ and instructs to load only what's needed, which is good structure. However, no bundle files are provided to verify the references exist, and the main content itself could benefit from splitting—the Outputs section mentions 'schema_version' and schema-bound outputs without linking to any schema definition file. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
4c78f98
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.