Counterexample-Guided Proof Debugger
Analyze counterexamples from Nitpick or QuickChick to explain proof failures and suggest corrections to specifications or proofs.
Workflow
1. Receive Counterexample Information
Identify what information is provided:
- Counterexample output: From Nitpick or QuickChick
- Failed theorem: The statement that couldn't be proven
- Proof attempt: Any partial proof or tactics tried
- Context: Definitions and lemmas involved
2. Choose Target System
Determine which proof assistant is being used:
- Isabelle/HOL with Nitpick: Finite model finder
- Coq with QuickChick: Property-based testing
- Both: Provide analysis for both systems
3. Analyze the Counterexample
Examine the counterexample systematically:
Verify the counterexample:
- Manually compute the result for the given values
- Confirm it actually violates the theorem
- Check if it's a genuine counterexample or tool limitation
Identify the violation:
- Which part of the theorem fails?
- What values cause the failure?
- Is it an edge case or fundamental issue?
Determine the root cause:
- Missing precondition?
- Incorrect specification?
- Wrong quantifier order?
- Implementation bug?
- Off-by-one error?
- Type constraint issue?
4. Explain the Failure
Provide clear explanation:
What went wrong:
- Describe why the counterexample violates the theorem
- Show the computation step-by-step
- Highlight the specific point of failure
Why it happened:
- Explain the underlying cause
- Identify the conceptual error
- Note any common patterns (empty list, boundary values, etc.)
Impact assessment:
- Is the theorem fundamentally wrong?
- Does it need preconditions?
- Is the specification incomplete?
5. Suggest Corrections
Provide actionable fixes based on the root cause:
For missing preconditions:
(* Before *)
lemma "hd xs ∈ set xs"
(* After *)
lemma "xs ≠ [] ⟹ hd xs ∈ set xs"
For incorrect specifications:
(* Before: uses < instead of <= *)
x < y && is_sorted (y :: ys)
(* After *)
x <= y && is_sorted (y :: ys)
For quantifier order:
(* Before *)
"∃y. ∀x. P x y"
(* After *)
"∀x. ∃y. P x y"
For incomplete specifications:
(* Before: only checks sortedness *)
is_sorted (sort l)
(* After: also checks permutation *)
is_sorted (sort l) && permutation l (sort l)
6. Recommend Next Steps
Guide the user on what to do:
Retest with fix:
- Run Nitpick/QuickChick again
- Verify no counterexample found
- Check if fix is sufficient
Identify additional issues:
- Are there other edge cases?
- Do other lemmas need fixing?
- Is the specification now complete?
Proceed with proof:
- If no counterexample, attempt proof
- Suggest proof strategy
- Identify needed helper lemmas
Counterexample Analysis Patterns
Pattern 1: Empty Structures
Symptom: Counterexample is [], {}, or None
Common causes:
- Missing non-empty precondition
- Undefined behavior on empty input
- Base case not handled
Fix: Add precondition or handle empty case explicitly
Pattern 2: Boundary Values
Symptom: Counterexample is 0, 1, or type limits
Common causes:
- Off-by-one errors
- Boundary condition not considered
- Edge case in arithmetic
Fix: Adjust bounds or add special case handling
Pattern 3: Duplicate Elements
Symptom: Counterexample has repeated values like [0, 0]
Common causes:
- Using
< instead of ≤
- Assuming distinctness
- Set vs. multiset confusion
Fix: Use appropriate comparison or add distinctness assumption
Pattern 4: Small Counterexamples
Symptom: Very small counterexample (1-2 elements)
Common causes:
- Fundamental specification error
- Wrong base case
- Incorrect inductive step
Fix: Review base definitions and inductive structure
Pattern 5: Type-Specific Values
Symptom: Counterexample at type boundaries
Common causes:
- Type constraints not considered
- Overflow/underflow issues
- Finite vs. infinite types
Fix: Add type constraints or adjust specification
Tool-Specific Guidance
Nitpick (Isabelle/HOL)
For detailed Nitpick usage and interpretation:
Key points:
- Searches for finite models
- Configurable cardinality bounds
- May miss counterexamples beyond bounds
- "No counterexample" ≠ proof
QuickChick (Coq)
For detailed QuickChick usage and interpretation:
Key points:
- Random testing with shrinking
- Configurable test count
- May miss rare counterexamples
- "Success" ≠ proof
Common Root Causes
Specification Errors
Symptoms:
- Counterexample shows spec doesn't match intent
- Multiple unrelated counterexamples
- Spec too weak or too strong
Fixes:
- Strengthen postconditions
- Add completeness requirements
- Review specification against intent
Missing Preconditions
Symptoms:
- Counterexample is edge case
- Empty structures or boundary values
- Undefined behavior
Fixes:
- Add non-empty constraints
- Add type bounds
- Add well-formedness conditions
Quantifier Issues
Symptoms:
- Counterexample shows wrong order
- Existential/universal confusion
- Skolem constant issues
Fixes:
- Swap quantifier order
- Review logical structure
- Check variable dependencies
Implementation Bugs
Symptoms:
- Counterexample shows function doesn't work
- Output doesn't match specification
- Logic error in definition
Fixes:
- Fix the implementation
- Review algorithm correctness
- Test implementation separately
Debugging Checklist
When analyzing a counterexample:
- Verify manually: Compute the result for the counterexample values
- Identify violation: Which part of the theorem fails?
- Find root cause: Why does it fail? (precondition, spec, impl, quantifiers)
- Suggest fix: What needs to change?
- Check completeness: Are there other issues?
- Retest: Run tool again after fix
- Proceed: If no counterexample, attempt proof
Examples
For complete debugging examples including:
- Incorrect sortedness definition
- Missing preconditions
- Wrong quantifier order
- Off-by-one errors
- Incomplete specifications
See examples.md
Tips
- Trust the counterexample: If tool finds one, investigate thoroughly
- Verify manually: Always check the counterexample by hand
- Look for patterns: Empty, boundary, duplicates are common
- Fix root cause: Don't just patch symptoms
- Retest after fixing: Ensure fix is complete
- Use tools early: Run Nitpick/QuickChick before proving
- Iterate: May need multiple rounds of debugging
- Document assumptions: Make preconditions explicit
- Check completeness: Ensure specification is sufficient