CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/detectability-contract

Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.25x

Quality

90%

Does it follow best practices?

Impact

98%

1.25x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

Evaluation results

100%

Build a Reliability Detectability Contract

boundary contract generation

Criteria
Without context
With context

Contract table exists

100%

100%

Invariant specificity

100%

100%

Failure mapping

100%

100%

Unknown state handling

100%

100%

Without context: $0.2281 · 1m 11s · 10 turns · 13 in / 3,923 out tokens

With context: $0.2553 · 1m 6s · 14 turns · 552 in / 3,450 out tokens

90%

36%

Data Pipeline Reliability Contract

File handoff contract

Criteria
Without context
With context

Boundary identification

100%

100%

Artifact exists invariant

100%

100%

Schema valid invariant

100%

100%

Table format

0%

100%

Failure class mapping

0%

100%

Escalation trigger defined

0%

100%

Verification probe defined

0%

100%

Path exists assert

100%

100%

Schema parse check

100%

100%

Missing file as critical

100%

100%

Retry-then-halt escalation

0%

0%

Without context: $0.3016 · 2m 11s · 12 turns · 15 in / 5,504 out tokens

With context: $0.5407 · 2m 30s · 25 turns · 408 in / 7,773 out tokens

100%

28%

Conversation Context Persistence Contract

Memory resume verification

Criteria
Without context
With context

Memory resume boundary

100%

100%

Key exists invariant

100%

100%

Timestamp freshness invariant

100%

100%

Value deserialises invariant

100%

100%

Table columns present

0%

100%

Stale entry as operational

0%

100%

Missing key as critical

0%

100%

Re-computation escalation

100%

100%

Timestamp check in script

100%

100%

Objective checks only

100%

100%

Non-null probe

100%

100%

Without context: $0.4224 · 1m 56s · 22 turns · 28 in / 6,305 out tokens

With context: $0.3769 · 1m 34s · 21 turns · 403 in / 5,268 out tokens

100%

32%

Payment API Integration Reliability Contract

Tool call reliability contract

Criteria
Without context
With context

Tool call boundary named

100%

100%

HTTP status invariant

100%

100%

Required fields invariant

100%

100%

Re-fetch probe

0%

100%

Required keys validation probe

100%

100%

Non-2xx as operational

44%

100%

Missing fields as critical

44%

100%

Two-failure escalation

0%

100%

Table format correct

50%

100%

Script field validation

100%

100%

Unknown state as operational

87%

100%

Non-zero exit on failure

100%

100%

Without context: $0.2546 · 1m 13s · 14 turns · 19 in / 4,087 out tokens

With context: $0.5270 · 2m 2s · 26 turns · 409 in / 6,940 out tokens

100%

26%

Automated Deployment Pipeline Contract

Multi-boundary workflow contract

Criteria
Without context
With context

Multiple boundary types

100%

100%

Five-column table

0%

100%

Invariants for each boundary

100%

100%

Probes for each boundary

100%

100%

Failure class for each boundary

0%

100%

Escalation trigger for each boundary

100%

100%

Artifact exists invariant used

100%

100%

Timestamp freshness invariant used

100%

100%

Checksum or hash invariant used

100%

100%

Critical vs operational distinction

0%

100%

Final report boundary included

100%

100%

Resume/readiness boundary included

100%

100%

Without context: $0.2956 · 1m 38s · 13 turns · 19 in / 5,486 out tokens

With context: $0.4596 · 2m 23s · 21 turns · 29 in / 8,105 out tokens

93%

36%

Workflow Orchestrator Escalation Policy

Escalation trigger design

Criteria
Without context
With context

File handoff escalation

75%

83%

Memory resume escalation

100%

100%

API call escalation

33%

58%

Five-column table

0%

100%

Critical vs operational

20%

100%

Critical triggers halt

80%

100%

Operational triggers retry

80%

100%

Missing evidence = operational

90%

100%

Invariants in table

57%

100%

Probes in table

28%

100%

Without context: $0.1977 · 1m 23s · 8 turns · 12 in / 3,772 out tokens

With context: $0.4700 · 2m 16s · 22 turns · 407 in / 6,794 out tokens

100%

Automated Stage-Gate Verification Script

Invariant check implementation

Criteria
Without context
With context

Path exists check

100%

100%

Non-empty check

100%

100%

JSON parse check

100%

100%

Timestamp freshness check

100%

100%

SHA-256 checksum

100%

100%

Exit code on failure

100%

100%

Per-check output

100%

100%

Conditional timestamp check

100%

100%

Contract table present

100%

100%

Failure message specificity

100%

100%

Assert pattern or equivalent

100%

100%

Without context: $0.2722 · 1m 13s · 14 turns · 17 in / 4,086 out tokens

With context: $0.4734 · 2m 16s · 26 turns · 31 in / 6,012 out tokens

100%

5%

Monitoring Alert Modernisation

Objective vs confidence triggers

Criteria
Without context
With context

Objective triggers only

100%

100%

No confidence-based trigger

100%

100%

Unverifiable state classified

100%

100%

File write boundary trigger

100%

100%

API integration boundary trigger

100%

100%

Cache freshness boundary trigger

100%

100%

Failure classification present

100%

100%

Design principle documented

100%

100%

Unknown state principle documented

100%

100%

Five-column table

0%

100%

Without context: $0.2218 · 1m 28s · 12 turns · 19 in / 3,691 out tokens

With context: $0.3468 · 1m 40s · 18 turns · 431 in / 5,187 out tokens

100%

14%

State Machine Failure Classification Scheme

Failure classification mapping

Criteria
Without context
With context

Missing artifact as critical

100%

100%

Bad schema as operational

0%

100%

Stale timestamp as operational

100%

100%

Non-2xx as operational

100%

100%

Missing fields as critical

100%

100%

Unknown state as operational minimum

100%

100%

Five-column table

50%

100%

Four boundary types

100%

100%

Critical halt escalation

100%

100%

Operational retry escalation

100%

100%

Taxonomy completeness

100%

100%

Without context: $0.2052 · 1m 17s · 9 turns · 12 in / 3,396 out tokens

With context: $0.3473 · 1m 44s · 17 turns · 398 in / 5,153 out tokens

Install with Tessl CLI

npx tessl i markusdowne/detectability-contract@0.1.2
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents