verification-before-completion

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

1.22x

Quality

72%

Does it follow best practices?

Impact

92%

1.22x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/verification-before-completion/SKILL.md

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description effectively communicates when to use the skill with an explicit 'Use when' clause and a clear behavioral mandate ('evidence before assertions always'). Its main weakness is that the specific actions are vague — 'running verification commands' could be more concrete with examples like running tests, linting, or building. The trigger terms lean toward Claude's internal decision-making rather than natural user language, which is appropriate for this self-governance type skill but limits trigger term quality.

Suggestions

Add specific examples of verification commands, e.g., 'running tests, linting, type-checking, and build commands' to improve specificity.

Include more natural trigger variations users might say, such as 'done', 'finished', 'ready to merge', 'all tests pass', 'looks good' to improve trigger term coverage.

Dimension	Reasoning	Score
Specificity	The description names a domain (verification before claiming completion) and some actions ('running verification commands and confirming output'), but doesn't list specific concrete actions like 'run tests', 'check linting', 'build project', or 'run type checks'.	2 / 3
Completeness	Clearly answers both 'what' (requires running verification commands and confirming output before making success claims) and 'when' (when about to claim work is complete, fixed, or passing, before committing or creating PRs). The 'Use when' clause is explicit.	3 / 3
Trigger Term Quality	Includes some relevant terms like 'complete', 'fixed', 'passing', 'committing', 'creating PRs', but misses common natural user phrases like 'done', 'finished', 'ready to merge', 'tests pass', 'ship it'. The terms are more about when Claude should self-trigger rather than what users would say.	2 / 3
Distinctiveness Conflict Risk	This skill occupies a clear niche — it's a behavioral guardrail about verification before claiming completion, which is distinct from skills that actually perform testing, committing, or PR creation. Unlikely to conflict with other skills.	3 / 3
	Total	10 / 12 Passed

Implementation

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured behavioral/process skill that clearly communicates a verification-before-claiming workflow with good examples and multiple reinforcing tables. Its main weakness is some redundancy between sections (Red Flags and Rationalization Prevention overlap significantly) and the lack of truly executable, technology-specific commands — though this is somewhat justified by the skill's domain-agnostic nature. The motivational/emotional framing adds tokens without adding actionable information.

Suggestions

Merge the 'Red Flags' and 'Rationalization Prevention' sections into a single table to eliminate redundancy and save tokens.

Consider adding one fully concrete end-to-end example with actual shell commands (e.g., running pytest, checking exit code, then making the claim) to increase actionability.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but has some redundancy — the 'Rationalization Prevention' table largely restates the 'Red Flags' section, and the motivational framing ('dishonesty, not efficiency', 'lying, not verifying') adds emotional weight but not actionable information. The tables are well-structured but could be tighter.	2 / 3
Actionability	The Gate Function provides a clear 5-step process and the Key Patterns section shows good/bad examples, but the examples use placeholder notation rather than actual executable commands. The skill is more of a behavioral checklist than concrete executable guidance — it tells Claude what to do conceptually but doesn't provide specific verification commands for any particular technology.	2 / 3
Workflow Clarity	The Gate Function is a clear, sequenced workflow with explicit validation checkpoints and a feedback loop (step 4: if NO, state actual status; if YES, proceed). The Key Patterns section provides clear pass/fail examples for multiple scenarios including the TDD red-green cycle with explicit revert-and-verify steps. The 'Common Failures' table clearly maps claims to required evidence.	3 / 3
Progressive Disclosure	For a standalone behavioral skill with no bundle files, the content is well-organized into logical sections with clear headers: overview → core rule → workflow → failure patterns → red flags → rationalization prevention → key patterns → motivation → when to apply. The length is appropriate for inline content and doesn't need external references.	3 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: obra/superpowers
Commit: f2cbfbe

Reviewed: 17 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.