counterexample-generator

Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.

1.10x

Quality

77%

Does it follow best practices?

Impact

85%

1.10x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./skills/counterexample-generator/SKILL.md

Quality

Content

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides highly actionable, well-structured guidance for counterexample generation with excellent workflow clarity and concrete executable examples. However, it is severely bloated—the content is 5-10x longer than necessary, repeating concepts Claude already understands and including exhaustive pattern catalogs that should be in separate reference files. The token inefficiency significantly undermines its utility.

Suggestions

Reduce content by 70-80%: Keep the 5-step workflow and 2-3 representative patterns, move the rest to a PATTERNS.md reference file

Remove explanatory text about basic concepts (what PDFs are, how assertions work, what race conditions are) - Claude knows these

Create separate files: PATTERNS.md for the 7 patterns, TECHNIQUES.md for generation techniques, TOOLS.md for tool references

Consolidate the counterexample report template and example into a single concise section rather than showing both separately

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~600+ lines with extensive repetition. Explains basic concepts Claude knows (what counterexamples are, basic Python errors, what assertions do). Many patterns and examples could be consolidated significantly.	1 / 3
Actionability	Provides fully executable Python code examples throughout, concrete input values, step-by-step execution traces, and copy-paste ready counterexample reports. Every pattern includes specific, runnable examples.	3 / 3
Workflow Clarity	Clear 5-step workflow (Identify → Analyze → Generate → Execute → Present) with explicit validation checkpoints. Each step has concrete examples and the workflow includes feedback loops for fixing and re-verifying.	3 / 3
Progressive Disclosure	Monolithic wall of text with no references to external files. All content is inline including 7 detailed patterns, 4 techniques, tools lists, and common scenarios that should be split into separate reference documents.	1 / 3
	Total	8 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines a specialized niche in formal verification debugging. It provides comprehensive trigger terms covering the domain vocabulary, explicitly states both capabilities and usage conditions, and is highly distinctive from other potential skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'Generate concrete counterexamples', 'debugging failed proofs', 'creating minimal reproducing examples', 'analyzing assertion violations', 'Produces concrete input values, execution traces, and state information'.	3 / 3
Completeness	Clearly answers both what ('Generate concrete counterexamples', 'Produces concrete input values, execution traces, and state information') AND when ('Use this skill when debugging failed proofs, understanding why verification fails...').	3 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'formal verification', 'assertions', 'specifications fail', 'failed proofs', 'verification fails', 'assertion violations', 'invariant breaks', 'specification mismatches', 'counterexamples'.	3 / 3
Distinctiveness Conflict Risk	Very clear niche focused on formal verification and counterexample generation. The specific domain terminology (proofs, invariants, specifications, assertions) makes it highly unlikely to conflict with general debugging or testing skills.	3 / 3
	Total	12 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (848 lines); consider splitting into references/ and linking	Warning

	Total	10 / 11 Passed

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Reviewed: 4 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.