Name: coding-agent-helpers/skeptic-verifier
Rating: 84.89999999999999 (1 reviews)
Author: coding-agent-helpers

Blog Docs Log in Get started

coding-agent-helpers/skeptic-verifier

Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".

1.30x

Quality

94%

Does it follow best practices?

Impact

81%

1.30x

Average score across 8 eval scenarios

Securityby

Passed

No known issues

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly communicates its adversarial verification purpose, lists concrete actions it performs, and provides explicit trigger guidance with natural user phrases. It uses proper third-person voice throughout and occupies a distinct niche that would be easy for Claude to differentiate from other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'targeted tests, edge-case repros, command runs, or API calls', 'try to break the claim', 'look for hidden regressions', and 'return PASS, PARTIAL, or FAIL with evidence'. These are concrete, actionable capabilities.	3 / 3
Completeness	Clearly answers both 'what' (run checks, try to break claims, look for regressions, return PASS/PARTIAL/FAIL with evidence) and 'when' (explicitly starts with 'Use when the user wants an adversarial double-check' and lists specific trigger phrases).	3 / 3
Trigger Term Quality	Includes excellent natural trigger phrases users would actually say: 'poke holes in this', 'stress test this change', 'double check this fix', 'try to break it', plus domain terms like 'adversarial double-check', 'code or config change', 'hidden regressions'.	3 / 3
Distinctiveness Conflict Risk	Occupies a clear niche as an adversarial verification/stress-testing skill, distinct from general code review or testing skills. The specific framing around 'breaking' claims and returning PASS/PARTIAL/FAIL verdicts makes it unlikely to conflict with standard code review or testing skills.	3 / 3
	Total	12 / 12 Passed

Implementation

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, concise skill that clearly defines an adversarial verification workflow with explicit verdict criteria and good guardrails (e.g., PARTIAL when environment blocks strong verification). Its main weakness is that actionability stays at the instructional/process level—it tells Claude what to do conceptually but doesn't provide concrete executable examples of verification commands, test snippets, or tool invocations that would make the 'run the strongest checks' step more concrete.

Suggestions

Add 1-2 concrete executable examples showing actual verification commands (e.g., running a specific test suite, crafting an edge-case input, using curl to hit an endpoint) to make step 4 more actionable.

Expand the mini example into a full worked example showing the complete output format populated with realistic evidence and attempts, so Claude has a concrete template to follow.

Dimension	Reasoning	Score
Conciseness	Every section earns its place. No unnecessary explanations of what adversarial testing is or why it matters. The rules section is tight and each bullet adds a distinct constraint.	3 / 3
Actionability	The workflow provides clear steps and the output format is concrete, but the guidance remains at the instructional level without executable code or specific commands. Steps like 'Run the strongest checks available' are somewhat vague—what tools, what commands? The mini example helps but is brief and not fully fleshed out.	2 / 3
Workflow Clarity	The 5-step workflow is clearly sequenced with a falsification bias explicitly stated. The verdict system (PASS/PARTIAL/FAIL) serves as a validation checkpoint, and the rules include explicit guidance for when verification is blocked or incomplete (return PARTIAL, not PASS), which functions as a feedback loop for uncertain outcomes.	3 / 3
Progressive Disclosure	For a skill under 50 lines with a single purpose, the content is well-organized into clear sections (Goal, Workflow, Output Format, Rules) with no need for external references. The structure supports easy scanning and discovery.	3 / 3
	Total	11 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

about 2 months ago

Table of Contents

Discovery Implementation Validation