CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-8/

{
  "context": "Tests whether the agent applies the skill's guardrails: using objective, measurable checks rather than agent confidence as triggers, preferring observable evidence over narrative assessment, and treating unverifiable state as at least operational risk.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Objective triggers only",
      "description": "Every alert trigger in contract.md is a concrete, measurable check (e.g. file exists, HTTP status, timestamp age) — none are described as 'agent feels uncertain', 'model confidence below X', or qualitative assessments",
      "max_score": 15
    },
    {
      "name": "No confidence-based trigger",
      "description": "contract.md does NOT use any of these phrases or equivalents: 'uncertain', 'feels like', 'seems', 'confidence score', 'agent decides' as a trigger condition",
      "max_score": 12
    },
    {
      "name": "Unverifiable state classified",
      "description": "The contract or design_notes explicitly states that unverifiable or unknown state is treated as at least an operational risk (not assumed safe)",
      "max_score": 12
    },
    {
      "name": "File write boundary trigger",
      "description": "The contract includes an objective check for the failed file write scenario (e.g. file exists, size > 0, checksum, timestamp)",
      "max_score": 10
    },
    {
      "name": "API integration boundary trigger",
      "description": "The contract includes an objective check for the broken API integration scenario (e.g. HTTP status, required fields, response time)",
      "max_score": 10
    },
    {
      "name": "Cache freshness boundary trigger",
      "description": "The contract includes an objective check for the cached configuration currency scenario (e.g. timestamp age, max_age threshold)",
      "max_score": 10
    },
    {
      "name": "Failure classification present",
      "description": "The contract classifies each scenario's failure as either operational or critical",
      "max_score": 8
    },
    {
      "name": "Design principle documented",
      "description": "design_notes.md explicitly states the preference for objective checks over confidence or narrative assessment as a design principle",
      "max_score": 10
    },
    {
      "name": "Unknown state principle documented",
      "description": "design_notes.md explicitly addresses how to handle cases where state cannot be verified (treating it as risk, not success)",
      "max_score": 8
    },
    {
      "name": "Five-column table",
      "description": "contract.md contains a table with at minimum columns for: Boundary, Invariants/Trigger, Failure Class, and Action",
      "max_score": 5
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract

evals

SKILL.md

tile.json