CtrlK
BlogDocsLog inGet started
Tessl Logo

counterexample-debugger

Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.

Install with Tessl CLI

npx tessl i github:ArabelaTso/Skills-4-SE --skill counterexample-debugger
What are skills?

94

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Counterexample-Guided Proof Debugger

Analyze counterexamples from Nitpick or QuickChick to explain proof failures and suggest corrections to specifications or proofs.

Workflow

1. Receive Counterexample Information

Identify what information is provided:

  • Counterexample output: From Nitpick or QuickChick
  • Failed theorem: The statement that couldn't be proven
  • Proof attempt: Any partial proof or tactics tried
  • Context: Definitions and lemmas involved

2. Choose Target System

Determine which proof assistant is being used:

  • Isabelle/HOL with Nitpick: Finite model finder
  • Coq with QuickChick: Property-based testing
  • Both: Provide analysis for both systems

3. Analyze the Counterexample

Examine the counterexample systematically:

Verify the counterexample:

  • Manually compute the result for the given values
  • Confirm it actually violates the theorem
  • Check if it's a genuine counterexample or tool limitation

Identify the violation:

  • Which part of the theorem fails?
  • What values cause the failure?
  • Is it an edge case or fundamental issue?

Determine the root cause:

  • Missing precondition?
  • Incorrect specification?
  • Wrong quantifier order?
  • Implementation bug?
  • Off-by-one error?
  • Type constraint issue?

4. Explain the Failure

Provide clear explanation:

What went wrong:

  • Describe why the counterexample violates the theorem
  • Show the computation step-by-step
  • Highlight the specific point of failure

Why it happened:

  • Explain the underlying cause
  • Identify the conceptual error
  • Note any common patterns (empty list, boundary values, etc.)

Impact assessment:

  • Is the theorem fundamentally wrong?
  • Does it need preconditions?
  • Is the specification incomplete?

5. Suggest Corrections

Provide actionable fixes based on the root cause:

For missing preconditions:

(* Before *)
lemma "hd xs ∈ set xs"

(* After *)
lemma "xs ≠ [] ⟹ hd xs ∈ set xs"

For incorrect specifications:

(* Before: uses < instead of <= *)
x < y && is_sorted (y :: ys)

(* After *)
x <= y && is_sorted (y :: ys)

For quantifier order:

(* Before *)
"∃y. ∀x. P x y"

(* After *)
"∀x. ∃y. P x y"

For incomplete specifications:

(* Before: only checks sortedness *)
is_sorted (sort l)

(* After: also checks permutation *)
is_sorted (sort l) && permutation l (sort l)

6. Recommend Next Steps

Guide the user on what to do:

Retest with fix:

  • Run Nitpick/QuickChick again
  • Verify no counterexample found
  • Check if fix is sufficient

Identify additional issues:

  • Are there other edge cases?
  • Do other lemmas need fixing?
  • Is the specification now complete?

Proceed with proof:

  • If no counterexample, attempt proof
  • Suggest proof strategy
  • Identify needed helper lemmas

Counterexample Analysis Patterns

Pattern 1: Empty Structures

Symptom: Counterexample is [], {}, or None

Common causes:

  • Missing non-empty precondition
  • Undefined behavior on empty input
  • Base case not handled

Fix: Add precondition or handle empty case explicitly

Pattern 2: Boundary Values

Symptom: Counterexample is 0, 1, or type limits

Common causes:

  • Off-by-one errors
  • Boundary condition not considered
  • Edge case in arithmetic

Fix: Adjust bounds or add special case handling

Pattern 3: Duplicate Elements

Symptom: Counterexample has repeated values like [0, 0]

Common causes:

  • Using < instead of
  • Assuming distinctness
  • Set vs. multiset confusion

Fix: Use appropriate comparison or add distinctness assumption

Pattern 4: Small Counterexamples

Symptom: Very small counterexample (1-2 elements)

Common causes:

  • Fundamental specification error
  • Wrong base case
  • Incorrect inductive step

Fix: Review base definitions and inductive structure

Pattern 5: Type-Specific Values

Symptom: Counterexample at type boundaries

Common causes:

  • Type constraints not considered
  • Overflow/underflow issues
  • Finite vs. infinite types

Fix: Add type constraints or adjust specification

Tool-Specific Guidance

Nitpick (Isabelle/HOL)

For detailed Nitpick usage and interpretation:

  • See nitpick_guide.md

Key points:

  • Searches for finite models
  • Configurable cardinality bounds
  • May miss counterexamples beyond bounds
  • "No counterexample" ≠ proof

QuickChick (Coq)

For detailed QuickChick usage and interpretation:

Key points:

  • Random testing with shrinking
  • Configurable test count
  • May miss rare counterexamples
  • "Success" ≠ proof

Common Root Causes

Specification Errors

Symptoms:

  • Counterexample shows spec doesn't match intent
  • Multiple unrelated counterexamples
  • Spec too weak or too strong

Fixes:

  • Strengthen postconditions
  • Add completeness requirements
  • Review specification against intent

Missing Preconditions

Symptoms:

  • Counterexample is edge case
  • Empty structures or boundary values
  • Undefined behavior

Fixes:

  • Add non-empty constraints
  • Add type bounds
  • Add well-formedness conditions

Quantifier Issues

Symptoms:

  • Counterexample shows wrong order
  • Existential/universal confusion
  • Skolem constant issues

Fixes:

  • Swap quantifier order
  • Review logical structure
  • Check variable dependencies

Implementation Bugs

Symptoms:

  • Counterexample shows function doesn't work
  • Output doesn't match specification
  • Logic error in definition

Fixes:

  • Fix the implementation
  • Review algorithm correctness
  • Test implementation separately

Debugging Checklist

When analyzing a counterexample:

  1. Verify manually: Compute the result for the counterexample values
  2. Identify violation: Which part of the theorem fails?
  3. Find root cause: Why does it fail? (precondition, spec, impl, quantifiers)
  4. Suggest fix: What needs to change?
  5. Check completeness: Are there other issues?
  6. Retest: Run tool again after fix
  7. Proceed: If no counterexample, attempt proof

Examples

For complete debugging examples including:

  • Incorrect sortedness definition
  • Missing preconditions
  • Wrong quantifier order
  • Off-by-one errors
  • Incomplete specifications

See examples.md

Tips

  • Trust the counterexample: If tool finds one, investigate thoroughly
  • Verify manually: Always check the counterexample by hand
  • Look for patterns: Empty, boundary, duplicates are common
  • Fix root cause: Don't just patch symptoms
  • Retest after fixing: Ensure fix is complete
  • Use tools early: Run Nitpick/QuickChick before proving
  • Iterate: May need multiple rounds of debugging
  • Document assumptions: Make preconditions explicit
  • Check completeness: Ensure specification is sufficient
Repository
ArabelaTso/Skills-4-SE
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.