CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-7/

{
  "context": "Tests whether the agent implements the specific Python invariant check patterns from the skill: Path().exists() for artifact existence, json.loads() for schema validation, time.time()-based timestamp freshness check within 300 seconds, and hashlib.sha256 for checksum matching.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Path exists check",
      "description": "stage_gate.py uses Path(...).exists() or os.path.exists() to check the artifact exists",
      "max_score": 10
    },
    {
      "name": "Non-empty check",
      "description": "stage_gate.py checks the file size is greater than 0 (e.g. os.path.getsize, Path.stat().st_size, or len(content) > 0)",
      "max_score": 8
    },
    {
      "name": "JSON parse check",
      "description": "stage_gate.py uses json.loads() or json.load() to verify the file parses as valid JSON",
      "max_score": 10
    },
    {
      "name": "Timestamp freshness check",
      "description": "stage_gate.py checks (time.time() - data['timestamp']) < 300 (or equivalent: within 5 minutes using datetime)",
      "max_score": 12
    },
    {
      "name": "SHA-256 checksum",
      "description": "stage_gate.py uses hashlib.sha256(...).hexdigest() to compute and compare the checksum",
      "max_score": 12
    },
    {
      "name": "Exit code on failure",
      "description": "stage_gate.py exits with a non-zero code (sys.exit(1) or equivalent) when any check fails",
      "max_score": 8
    },
    {
      "name": "Per-check output",
      "description": "stage_gate.py prints a per-check result to stdout (pass/fail for each individual check, not just a summary)",
      "max_score": 8
    },
    {
      "name": "Conditional timestamp check",
      "description": "stage_gate.py only checks timestamp if the 'timestamp' field exists in the JSON (not unconditional)",
      "max_score": 8
    },
    {
      "name": "Contract table present",
      "description": "contract.md contains a table with boundary, invariants, and failure action columns (at minimum)",
      "max_score": 8
    },
    {
      "name": "Failure message specificity",
      "description": "stage_gate.py prints a specific failure message identifying WHICH check failed (not just 'error' or 'failed')",
      "max_score": 8
    },
    {
      "name": "Assert pattern or equivalent",
      "description": "stage_gate.py uses assert statements or explicit if/raise patterns for invariant checking (not just print warnings without exit)",
      "max_score": 8
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract

evals

SKILL.md

tile.json