Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.
81
77%
Does it follow best practices?
Impact
85%
1.10xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/counterexample-generator/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines a specialized niche in formal verification debugging. It provides comprehensive trigger terms covering the domain vocabulary, explicitly states both capabilities and usage conditions, and is highly distinctive from other potential skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Generate concrete counterexamples', 'debugging failed proofs', 'creating minimal reproducing examples', 'analyzing assertion violations', 'Produces concrete input values, execution traces, and state information'. | 3 / 3 |
Completeness | Clearly answers both what ('Generate concrete counterexamples', 'Produces concrete input values, execution traces, and state information') AND when ('Use this skill when debugging failed proofs, understanding why verification fails...'). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'formal verification', 'assertions', 'specifications fail', 'failed proofs', 'verification fails', 'assertion violations', 'invariant breaks', 'specification mismatches', 'counterexamples'. | 3 / 3 |
Distinctiveness Conflict Risk | Very clear niche focused on formal verification and counterexample generation. The specific domain terminology (proofs, invariants, specifications, assertions) makes it highly unlikely to conflict with general debugging or testing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides highly actionable, well-structured guidance for counterexample generation with excellent workflow clarity and concrete executable examples. However, it is severely bloated—the content is 5-10x longer than necessary, repeating concepts Claude already understands and including exhaustive pattern catalogs that should be in separate reference files. The token inefficiency significantly undermines its utility.
Suggestions
Reduce content by 70-80%: Keep the 5-step workflow and 2-3 representative patterns, move the rest to a PATTERNS.md reference file
Remove explanatory text about basic concepts (what PDFs are, how assertions work, what race conditions are) - Claude knows these
Create separate files: PATTERNS.md for the 7 patterns, TECHNIQUES.md for generation techniques, TOOLS.md for tool references
Consolidate the counterexample report template and example into a single concise section rather than showing both separately
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~600+ lines with extensive repetition. Explains basic concepts Claude knows (what counterexamples are, basic Python errors, what assertions do). Many patterns and examples could be consolidated significantly. | 1 / 3 |
Actionability | Provides fully executable Python code examples throughout, concrete input values, step-by-step execution traces, and copy-paste ready counterexample reports. Every pattern includes specific, runnable examples. | 3 / 3 |
Workflow Clarity | Clear 5-step workflow (Identify → Analyze → Generate → Execute → Present) with explicit validation checkpoints. Each step has concrete examples and the workflow includes feedback loops for fixing and re-verifying. | 3 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files. All content is inline including 7 detailed patterns, 4 techniques, tools lists, and common scenarios that should be split into separate reference documents. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (848 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
0f00a4f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.