Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.
Install with Tessl CLI
npx tessl i github:ArabelaTso/Skills-4-SE --skill counterexample-debugger94
Does it follow best practices?
Validation for skill structure
Missing precondition identification
Manual verification
100%
100%
Empty list pattern
75%
100%
Undefined hd behavior
30%
80%
Root cause: missing precondition
100%
100%
Corrected theorem syntax
100%
100%
Impact assessment
100%
100%
Step-by-step explanation
100%
100%
Retest recommendation
100%
100%
No counterexample caveat
75%
75%
Completeness check
100%
100%
Root cause not symptom
100%
100%
Without context: $0.1825 · 1m 7s · 10 turns · 14 in / 3,876 out tokens
With context: $0.4430 · 1m 40s · 22 turns · 30 in / 5,115 out tokens
Duplicate element pattern and comparison operator fix
Manual computation
100%
100%
Duplicate element pattern
100%
100%
Strict vs non-strict inequality
100%
100%
Correct root cause category
100%
100%
Corrected definition
100%
100%
Before/after code
100%
100%
Shrinking interpretation
0%
0%
Retest recommendation
100%
100%
Success does not mean proof
37%
50%
Completeness: permutation missing
0%
100%
Step-by-step violation
100%
100%
Without context: $0.2072 · 1m 9s · 9 turns · 14 in / 3,235 out tokens
With context: $0.5849 · 1m 51s · 24 turns · 3,847 in / 6,617 out tokens
Quantifier order and incomplete specification
Quantifier root cause (A)
100%
100%
∃∀ vs ∀∃ logical difference (A)
80%
100%
Corrected quantifier theorem (A)
100%
100%
Witness for corrected theorem (A)
100%
100%
Theorem A is fundamentally wrong (A)
100%
100%
Manual trace for [1, 2, 0] (B)
100%
100%
sorted [1, 0, 2] is False (B)
100%
100%
Implementation bug root cause (B)
100%
100%
Incomplete spec note (B)
75%
100%
Strengthened spec (B)
62%
100%
Retest recommendation
62%
100%
No counterexample caveat
50%
62%
Without context: $0.3664 · 2m 23s · 11 turns · 14 in / 7,980 out tokens
With context: $0.5136 · 2m 12s · 18 turns · 3,122 in / 6,896 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.