Name: markusdowne/detectability-contract
Rating: 0.964 (1 reviews)
Author: markusdowne

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

{
  "context": "Tests whether the agent specifies precise, differentiated escalation triggers for different boundary types, including retry-then-halt for file handoffs, force re-computation for memory resume, and consecutive-failure thresholds for API calls, using the skill's prescribed escalation patterns.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "File handoff escalation",
      "description": "The contract specifies 'retry once, then halt and report' (or semantically equivalent) as the escalation for a file/artifact handoff boundary",
      "max_score": 12
    },
    {
      "name": "Memory resume escalation",
      "description": "The contract specifies 'force re-computation before proceeding' (or semantically equivalent: regenerate, rebuild cache) as the escalation for a memory/state-resume boundary",
      "max_score": 12
    },
    {
      "name": "API call escalation",
      "description": "The contract specifies escalation after 2 consecutive failures (or equivalent threshold) for an API/tool-call boundary",
      "max_score": 12
    },
    {
      "name": "Five-column table",
      "description": "contract.md contains a markdown table with columns: Boundary, Required Invariants, Verification Probes, Failure Class, Escalation Trigger",
      "max_score": 10
    },
    {
      "name": "Critical vs operational",
      "description": "The contract distinguishes between 'critical' and 'operational' failure classes across the different boundaries",
      "max_score": 10
    },
    {
      "name": "Critical triggers halt",
      "description": "The contract maps at least one 'critical' failure class to a halt or stop-pipeline escalation (not just retry)",
      "max_score": 10
    },
    {
      "name": "Operational triggers retry",
      "description": "The contract maps at least one 'operational' failure class to a retry or recovery escalation (not halt)",
      "max_score": 10
    },
    {
      "name": "Missing evidence = operational",
      "description": "The contract treats a missing or unverifiable state (e.g. no evidence of completion, unverifiable result) as at least operational minimum risk",
      "max_score": 10
    },
    {
      "name": "Invariants in table",
      "description": "Every boundary row includes at least one invariant in the Required Invariants column",
      "max_score": 7
    },
    {
      "name": "Probes in table",
      "description": "Every boundary row includes at least one verification probe",
      "max_score": 7
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract

markusdowne/detectability-contract

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-6/

rubric.jsonevals/scenario-6/