CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-3/

{
  "context": "Tests whether the agent correctly handles the memory-resume boundary, including key-exists invariants, timestamp-freshness checks, deserialization validation, and appropriate failure classification and verification probes for memory persistence.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Memory resume boundary",
      "description": "contract.md identifies the memory/session resume as a named boundary point",
      "max_score": 8
    },
    {
      "name": "Key exists invariant",
      "description": "The contract includes an invariant that checks the session key exists in the store (e.g. 'key exists in store', 'store.get(key) returns non-null')",
      "max_score": 10
    },
    {
      "name": "Timestamp freshness invariant",
      "description": "The contract includes a 'timestamp fresh' or 'timestamp < max_age' invariant for the memory boundary",
      "max_score": 10
    },
    {
      "name": "Value deserialises invariant",
      "description": "The contract includes an invariant verifying the stored value can be deserialised (e.g. JSON parse, schema check)",
      "max_score": 8
    },
    {
      "name": "Table columns present",
      "description": "contract.md contains a markdown table with columns: Boundary, Required Invariants, Verification Probes, Failure Class, Escalation Trigger",
      "max_score": 8
    },
    {
      "name": "Stale entry as operational",
      "description": "The contract classifies a stale/expired entry as 'operational' (not critical) failure class",
      "max_score": 10
    },
    {
      "name": "Missing key as critical",
      "description": "The contract classifies a missing key as 'critical' failure class",
      "max_score": 10
    },
    {
      "name": "Re-computation escalation",
      "description": "The contract specifies 'force re-computation' (or equivalent: regenerate context, restart session fresh) as the escalation action",
      "max_score": 8
    },
    {
      "name": "Timestamp check in script",
      "description": "resume_check.py checks that (current_time - timestamp) is within a max_age threshold (e.g. 300 seconds), using time.time() or datetime",
      "max_score": 10
    },
    {
      "name": "Objective checks only",
      "description": "Neither contract.md nor resume_check.py uses agent confidence, gut-feel, or qualitative assessment as a trigger — all checks are deterministic comparisons",
      "max_score": 10
    },
    {
      "name": "Non-null probe",
      "description": "The contract or script includes an explicit check that a store.get or file load returns a non-null/non-empty result",
      "max_score": 8
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract@0.1.2

evals

SKILL.md

tile.json