CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-4/

{
  "context": "Tests whether the agent correctly models an external tool call boundary with HTTP status invariants, required field checks, verification probes including re-fetching and key validation, proper failure class mapping between non-2xx and missing fields, and two-consecutive-failure escalation policy.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Tool call boundary named",
      "description": "contract.md identifies the external API call (payment processor call) as a named boundary point",
      "max_score": 7
    },
    {
      "name": "HTTP status invariant",
      "description": "The contract includes an invariant checking HTTP status is 2xx (e.g. 'HTTP status 2xx', 'status == 200')",
      "max_score": 9
    },
    {
      "name": "Required fields invariant",
      "description": "The contract includes an invariant that the response contains required fields (e.g. 'response has required fields', 'transaction_id present')",
      "max_score": 9
    },
    {
      "name": "Re-fetch probe",
      "description": "The contract specifies a verification probe involving re-fetching or re-validating the result (e.g. 're-fetch result', 'query transaction status')",
      "max_score": 8
    },
    {
      "name": "Required keys validation probe",
      "description": "The contract specifies a probe that validates required keys are present in the response",
      "max_score": 8
    },
    {
      "name": "Non-2xx as operational",
      "description": "The contract classifies a non-2xx HTTP status as 'operational' (recoverable) failure class",
      "max_score": 9
    },
    {
      "name": "Missing fields as critical",
      "description": "The contract classifies missing required response fields as 'critical' failure class",
      "max_score": 9
    },
    {
      "name": "Two-failure escalation",
      "description": "The contract specifies escalation after 2 consecutive failures (or equivalent: 'escalate after 2 consecutive failures', 'retry once then escalate')",
      "max_score": 9
    },
    {
      "name": "Table format correct",
      "description": "contract.md contains a markdown table with columns: Boundary, Required Invariants, Verification Probes, Failure Class, Escalation Trigger",
      "max_score": 8
    },
    {
      "name": "Script field validation",
      "description": "validate_response.py checks each required field is present in the parsed response (not just checks file exists)",
      "max_score": 9
    },
    {
      "name": "Unknown state as operational",
      "description": "The contract treats a response that cannot be verified (e.g. timeout, unparseable body) as at least operational risk, not assumed success",
      "max_score": 8
    },
    {
      "name": "Non-zero exit on failure",
      "description": "validate_response.py exits with a non-zero exit code when any invariant fails (not just prints a warning)",
      "max_score": 7
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract@0.1.2

evals

SKILL.md

tile.json