counterexample-debugger

Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.

1.10x

Quality

92%

Does it follow best practices?

Impact

93%

1.10x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Counterexample-Guided Proof Debugger

Analyze counterexamples from Nitpick or QuickChick to explain proof failures and suggest corrections to specifications or proofs.

Workflow

1. Receive Counterexample Information

Identify what information is provided:

Counterexample output: From Nitpick or QuickChick
Failed theorem: The statement that couldn't be proven
Proof attempt: Any partial proof or tactics tried
Context: Definitions and lemmas involved

2. Choose Target System

Determine which proof assistant is being used:

Isabelle/HOL with Nitpick: Finite model finder
Coq with QuickChick: Property-based testing
Both: Provide analysis for both systems

3. Analyze the Counterexample

Examine the counterexample systematically:

Verify the counterexample:

Manually compute the result for the given values
Confirm it actually violates the theorem
Check if it's a genuine counterexample or tool limitation

Identify the violation:

Which part of the theorem fails?
What values cause the failure?
Is it an edge case or fundamental issue?

Determine the root cause:

Missing precondition?
Incorrect specification?
Wrong quantifier order?
Implementation bug?
Off-by-one error?
Type constraint issue?

4. Explain the Failure

Provide clear explanation:

What went wrong:

Describe why the counterexample violates the theorem
Show the computation step-by-step
Highlight the specific point of failure

Why it happened:

Explain the underlying cause
Identify the conceptual error
Note any common patterns (empty list, boundary values, etc.)

Impact assessment:

Is the theorem fundamentally wrong?
Does it need preconditions?
Is the specification incomplete?

5. Suggest Corrections

Provide actionable fixes based on the root cause:

For missing preconditions:

(* Before *)
lemma "hd xs ∈ set xs"

(* After *)
lemma "xs ≠ [] ⟹ hd xs ∈ set xs"

For incorrect specifications:

(* Before: uses < instead of <= *)
x < y && is_sorted (y :: ys)

(* After *)
x <= y && is_sorted (y :: ys)

For quantifier order:

(* Before *)
"∃y. ∀x. P x y"

(* After *)
"∀x. ∃y. P x y"

For incomplete specifications:

(* Before: only checks sortedness *)
is_sorted (sort l)

(* After: also checks permutation *)
is_sorted (sort l) && permutation l (sort l)

6. Recommend Next Steps

Guide the user on what to do:

Retest with fix:

Run Nitpick/QuickChick again
Verify no counterexample found
Check if fix is sufficient

Identify additional issues:

Are there other edge cases?
Do other lemmas need fixing?
Is the specification now complete?

Proceed with proof:

If no counterexample, attempt proof
Suggest proof strategy
Identify needed helper lemmas

Counterexample Analysis Patterns

Pattern 1: Empty Structures

Symptom: Counterexample is [], {}, or None

Common causes:

Missing non-empty precondition
Undefined behavior on empty input
Base case not handled

Fix: Add precondition or handle empty case explicitly

Pattern 2: Boundary Values

Symptom: Counterexample is 0, 1, or type limits

Common causes:

Off-by-one errors
Boundary condition not considered
Edge case in arithmetic

Fix: Adjust bounds or add special case handling

Pattern 3: Duplicate Elements

Symptom: Counterexample has repeated values like [0, 0]

Common causes:

Using < instead of ≤
Assuming distinctness
Set vs. multiset confusion

Fix: Use appropriate comparison or add distinctness assumption

Pattern 4: Small Counterexamples

Symptom: Very small counterexample (1-2 elements)

Common causes:

Fundamental specification error
Wrong base case
Incorrect inductive step

Fix: Review base definitions and inductive structure

Pattern 5: Type-Specific Values

Symptom: Counterexample at type boundaries

Common causes:

Type constraints not considered
Overflow/underflow issues
Finite vs. infinite types

Fix: Add type constraints or adjust specification

Tool-Specific Guidance

Nitpick (Isabelle/HOL)

For detailed Nitpick usage and interpretation:

See nitpick_guide.md

Key points:

Searches for finite models
Configurable cardinality bounds
May miss counterexamples beyond bounds
"No counterexample" ≠ proof

QuickChick (Coq)

For detailed QuickChick usage and interpretation:

See quickchick_guide.md

Key points:

Random testing with shrinking
Configurable test count
May miss rare counterexamples
"Success" ≠ proof

Common Root Causes

Specification Errors

Symptoms:

Counterexample shows spec doesn't match intent
Multiple unrelated counterexamples
Spec too weak or too strong

Fixes:

Strengthen postconditions
Add completeness requirements
Review specification against intent

Missing Preconditions

Symptoms:

Counterexample is edge case
Empty structures or boundary values
Undefined behavior

Fixes:

Add non-empty constraints
Add type bounds
Add well-formedness conditions

Quantifier Issues

Symptoms:

Counterexample shows wrong order
Existential/universal confusion
Skolem constant issues

Fixes:

Swap quantifier order
Review logical structure
Check variable dependencies

Implementation Bugs

Symptoms:

Counterexample shows function doesn't work
Output doesn't match specification
Logic error in definition

Fixes:

Fix the implementation
Review algorithm correctness
Test implementation separately

Debugging Checklist

When analyzing a counterexample:

Verify manually: Compute the result for the counterexample values
Identify violation: Which part of the theorem fails?
Find root cause: Why does it fail? (precondition, spec, impl, quantifiers)
Suggest fix: What needs to change?
Check completeness: Are there other issues?
Retest: Run tool again after fix
Proceed: If no counterexample, attempt proof

Examples

For complete debugging examples including:

Incorrect sortedness definition
Missing preconditions
Wrong quantifier order
Off-by-one errors
Incomplete specifications

See examples.md

Tips

Trust the counterexample: If tool finds one, investigate thoroughly
Verify manually: Always check the counterexample by hand
Look for patterns: Empty, boundary, duplicates are common
Fix root cause: Don't just patch symptoms
Retest after fixing: Ensure fix is complete
Use tools early: Run Nitpick/QuickChick before proving
Iterate: May need multiple rounds of debugging
Document assumptions: Make preconditions explicit
Check completeness: Ensure specification is sufficient

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Last updated: 4 months ago
Created: 4 months ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.