CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-5/

{
  "context": "Tests whether the agent produces a complete multi-boundary contract covering all five boundary types from the skill workflow, uses the correct table format with all five required columns, and maps failures to operational/critical classes with escalation triggers for each boundary.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Multiple boundary types",
      "description": "contract.md includes at least 3 distinct boundary types from the set: state write, handoff, resume, external tool call, final report",
      "max_score": 10
    },
    {
      "name": "Five-column table",
      "description": "contract.md contains a markdown table with exactly these columns: Boundary, Required Invariants, Verification Probes, Failure Class, Escalation Trigger",
      "max_score": 10
    },
    {
      "name": "Invariants for each boundary",
      "description": "Every boundary row has at least one invariant listed (not empty cells)",
      "max_score": 8
    },
    {
      "name": "Probes for each boundary",
      "description": "Every boundary row has at least one verification probe listed",
      "max_score": 8
    },
    {
      "name": "Failure class for each boundary",
      "description": "Every boundary row has a failure class column entry mapping at least one failure to either 'operational' or 'critical'",
      "max_score": 8
    },
    {
      "name": "Escalation trigger for each boundary",
      "description": "Every boundary row has a non-empty escalation trigger",
      "max_score": 8
    },
    {
      "name": "Artifact exists invariant used",
      "description": "At least one row includes an artifact-existence invariant (e.g. file exists, image digest present, commit SHA exists)",
      "max_score": 8
    },
    {
      "name": "Timestamp freshness invariant used",
      "description": "At least one row includes a timestamp freshness or max_age invariant",
      "max_score": 8
    },
    {
      "name": "Checksum or hash invariant used",
      "description": "At least one row includes a checksum, hash, or digest-matching invariant",
      "max_score": 8
    },
    {
      "name": "Critical vs operational distinction",
      "description": "The contract uses both 'critical' and 'operational' classifications (or equivalent severity labels) across different boundary rows",
      "max_score": 8
    },
    {
      "name": "Final report boundary included",
      "description": "The contract includes a boundary point corresponding to the final report stage (smoke test result or deployment report)",
      "max_score": 8
    },
    {
      "name": "Resume/readiness boundary included",
      "description": "The contract includes a boundary point corresponding to a resume or readiness check (deployment rollout or cluster state)",
      "max_score": 8
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract@0.1.2

evals

SKILL.md

tile.json