counterexample-debugger

Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.

1.10x

Quality

92%

Does it follow best practices?

Impact

93%

1.10x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Evaluation results

96%

Debug Report: Isabelle/HOL Theorem Failure

Missing precondition identification

Criteria

Without context

With context

Manual verification

100%

Empty list pattern

75%

100%

Undefined hd behavior

30%

80%

Root cause: missing precondition

100%

Corrected theorem syntax

100%

Impact assessment

100%

Step-by-step explanation

100%

Retest recommendation

100%

No counterexample caveat

75%

Completeness check

100%

Root cause not symptom

100%

88%

Debug Report: QuickChick Property Failure in Coq

Duplicate element pattern and comparison operator fix

Criteria

Without context

With context

Manual computation

100%

Duplicate element pattern

100%

Strict vs non-strict inequality

100%

Correct root cause category

100%

Corrected definition

100%

Before/after code

100%

Shrinking interpretation

Retest recommendation

100%

Success does not mean proof

37%

50%

Completeness: permutation missing

100%

Step-by-step violation

100%

97%

11%

Debug Report: Two Failing Isabelle/HOL Theorems

Quantifier order and incomplete specification

Criteria

Without context

With context

Quantifier root cause (A)

100%

∃∀ vs ∀∃ logical difference (A)

80%

100%

Corrected quantifier theorem (A)

100%

Witness for corrected theorem (A)

100%

Theorem A is fundamentally wrong (A)

100%

Manual trace for [1, 2, 0] (B)

100%

sorted [1, 0, 2] is False (B)

100%

Implementation bug root cause (B)

100%

Incomplete spec note (B)

75%

100%

Strengthened spec (B)

62%

100%

Retest recommendation

62%

100%

No counterexample caveat

50%

62%

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Evaluated: about 2 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Debug Report: Isabelle/HOL Theorem Failure Debug Report: QuickChick Property Failure in Coq Debug Report: Two Failing Isabelle/HOL Theorems

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.