Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
80
72%
Does it follow best practices?
Impact
92%
1.22xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/verification-before-completion/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description effectively communicates when to use the skill with an explicit 'Use when' clause and a clear behavioral mandate ('evidence before assertions always'). Its main weakness is that the specific actions are vague — 'running verification commands' could be more concrete with examples like running tests, linting, or building. The trigger terms lean toward Claude's internal decision-making rather than natural user language, which is appropriate for this self-governance type skill but limits trigger term quality.
Suggestions
Add specific examples of verification commands, e.g., 'running tests, linting, type-checking, and build commands' to improve specificity.
Include more natural trigger variations users might say, such as 'done', 'finished', 'ready to merge', 'all tests pass', 'looks good' to improve trigger term coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names a domain (verification before claiming completion) and some actions ('running verification commands and confirming output'), but doesn't list specific concrete actions like 'run tests', 'check linting', 'build project', or 'run type checks'. | 2 / 3 |
Completeness | Clearly answers both 'what' (requires running verification commands and confirming output before making success claims) and 'when' (when about to claim work is complete, fixed, or passing, before committing or creating PRs). The 'Use when' clause is explicit. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'complete', 'fixed', 'passing', 'committing', 'creating PRs', but misses common natural user phrases like 'done', 'finished', 'ready to merge', 'tests pass', 'ship it'. The terms are more about when Claude should self-trigger rather than what users would say. | 2 / 3 |
Distinctiveness Conflict Risk | This skill occupies a clear niche — it's a behavioral guardrail about verification before claiming completion, which is distinct from skills that actually perform testing, committing, or PR creation. Unlikely to conflict with other skills. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured behavioral/process skill that clearly communicates a verification-before-claiming workflow with good examples and multiple reinforcing tables. Its main weakness is some redundancy between sections (Red Flags and Rationalization Prevention overlap significantly) and the lack of truly executable, technology-specific commands — though this is somewhat justified by the skill's domain-agnostic nature. The motivational/emotional framing adds tokens without adding actionable information.
Suggestions
Merge the 'Red Flags' and 'Rationalization Prevention' sections into a single table to eliminate redundancy and save tokens.
Consider adding one fully concrete end-to-end example with actual shell commands (e.g., running pytest, checking exit code, then making the claim) to increase actionability.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably efficient but has some redundancy — the 'Rationalization Prevention' table largely restates the 'Red Flags' section, and the motivational framing ('dishonesty, not efficiency', 'lying, not verifying') adds emotional weight but not actionable information. The tables are well-structured but could be tighter. | 2 / 3 |
Actionability | The Gate Function provides a clear 5-step process and the Key Patterns section shows good/bad examples, but the examples use placeholder notation rather than actual executable commands. The skill is more of a behavioral checklist than concrete executable guidance — it tells Claude what to do conceptually but doesn't provide specific verification commands for any particular technology. | 2 / 3 |
Workflow Clarity | The Gate Function is a clear, sequenced workflow with explicit validation checkpoints and a feedback loop (step 4: if NO, state actual status; if YES, proceed). The Key Patterns section provides clear pass/fail examples for multiple scenarios including the TDD red-green cycle with explicit revert-and-verify steps. The 'Common Failures' table clearly maps claims to required evidence. | 3 / 3 |
Progressive Disclosure | For a standalone behavioral skill with no bundle files, the content is well-organized into logical sections with clear headers: overview → core rule → workflow → failure patterns → red flags → rationalization prevention → key patterns → motivation → when to apply. The length is appropriate for inline content and doesn't need external references. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f2cbfbe
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.