CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/handoff-integrity-check

Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.50x

Quality

100%

Does it follow best practices?

Impact

96%

1.50x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

Evaluation results

100%

Create a Handoff Integrity Procedure

handoff validation and replay readiness

Criteria
Without context
With context

Required fields

100%

100%

Freshness and token checks

100%

100%

Replay test

100%

100%

Classification mapping

100%

100%

Example outputs

100%

100%

Without context: $0.2221 · 1m 14s · 11 turns · 18 in / 3,572 out tokens

With context: $0.4120 · 1m 37s · 21 turns · 336 in / 5,248 out tokens

100%

25%

Pre-Rotation Handoff Review

Schema validation and output format

Criteria
Without context
With context

All 8 fields checked

60%

100%

Empty next_action flagged

100%

100%

Empty assumptions flagged

100%

100%

Per-check pass/fail summary

33%

100%

Non-clean classification

100%

100%

Explicit classification label

0%

100%

Recovery steps listed

100%

100%

Escalation recommendation

80%

100%

Freshness check performed

87%

100%

Without context: $0.2223 · 1m 17s · 9 turns · 14 in / 3,399 out tokens

With context: $0.3062 · 1m 34s · 12 turns · 477 in / 4,948 out tokens

100%

49%

Resuming a Long-Paused Data Pipeline Task

Freshness check with 48-hour threshold

Criteria
Without context
With context

Freshness check performed

100%

100%

Age quantified

100%

100%

Freshness marked as failed

73%

100%

48-hour threshold referenced

0%

100%

OPERATIONAL classification

0%

100%

Recovery includes timestamp update

100%

100%

Per-check summary

50%

100%

Escalation present

0%

100%

Without context: $0.1592 · 54s · 9 turns · 14 in / 2,453 out tokens

With context: $0.3621 · 1m 46s · 15 turns · 483 in / 5,474 out tokens

100%

12%

Agent Handoff Gate Review

Resume token format validation

Criteria
Without context
With context

Token check performed

100%

100%

Packet A token rejected

100%

100%

Packet B token rejected

100%

100%

Format requirements stated

66%

100%

Non-clean classification for both

66%

100%

Recovery includes new token

100%

100%

Per-check summary both packets

100%

100%

Escalation for both packets

66%

100%

Without context: $0.2280 · 1m 23s · 8 turns · 13 in / 4,178 out tokens

With context: $0.5150 · 2m 21s · 22 turns · 368 in / 7,912 out tokens

100%

23%

Resuming a Critical Security Audit Task

Missing artifact critical classification

Criteria
Without context
With context

Artifact absence noted

100%

100%

CRITICAL classification

25%

100%

Not classified as OPERATIONAL

100%

100%

Not classified as CLEAN

100%

100%

Escalation to task owner

100%

100%

Recovery steps present

100%

100%

Does not proceed without handoff

100%

100%

Per-check summary present

75%

100%

Explicit classification label

25%

100%

Without context: $0.2450 · 1m 24s · 15 turns · 20 in / 3,670 out tokens

With context: $0.4368 · 1m 30s · 18 turns · 436 in / 4,278 out tokens

100%

43%

Evaluating a Handoff Before Automated Resume

Replay test failure guardrail

Criteria
Without context
With context

Replay test attempted

58%

100%

Contradiction identified

100%

100%

Replay test marked failed

41%

100%

Not classified as CLEAN

100%

100%

OPERATIONAL classification

0%

100%

Does not mark handoff successful

100%

100%

Per-check summary

0%

100%

Replay failure in summary

0%

100%

Escalation present

50%

100%

Without context: $0.1457 · 53s · 8 turns · 13 in / 2,408 out tokens

With context: $0.4680 · 2m 12s · 19 turns · 329 in / 7,778 out tokens

92%

49%

Build a Shared Handoff Validation Library

Python validator implementation

Criteria
Without context
With context

48-hour constant

0%

100%

Timezone-aware datetime parsing

100%

100%

UTC now comparison

100%

100%

Token regex pattern

0%

100%

All 8 fields validated

100%

100%

Non-empty validation

70%

80%

Replay test questions

0%

75%

Classification logic

20%

70%

Consumed token check

0%

100%

Demo runs cleanly

100%

100%

Without context: $0.2450 · 1m 17s · 14 turns · 18 in / 4,817 out tokens

With context: $0.6546 · 2m 45s · 30 turns · 341 in / 9,306 out tokens

72%

52%

Quarterly Handoff Audit Report

Multi-packet classification audit

Criteria
Without context
With context

Alpha classified CLEAN

0%

0%

Beta classified OPERATIONAL

0%

100%

Gamma classified OPERATIONAL

0%

0%

Per-check breakdown for all three

0%

100%

Recovery steps for Beta and Gamma

58%

66%

Escalation for all three

30%

100%

CLEAN/OPERATIONAL/CRITICAL labels used

0%

100%

Freshness check on all packets

66%

100%

Summary table or section

100%

100%

Without context: $0.1735 · 1m 9s · 8 turns · 13 in / 3,145 out tokens

With context: $0.4463 · 2m 16s · 18 turns · 328 in / 7,452 out tokens

100%

34%

Assessing a Vague Team Handoff

Uncertainty operational classification

Criteria
Without context
With context

Vagueness identified

100%

100%

Replay test attempted

70%

100%

Replay test not confirmed

73%

100%

Not classified as CLEAN

100%

100%

OPERATIONAL classification

0%

100%

Recovery steps present

100%

100%

Escalation recommendation

75%

100%

Per-check summary

25%

100%

Schema check passes

0%

100%

Does not recommend immediate resumption

100%

100%

Without context: $0.1679 · 57s · 8 turns · 13 in / 2,370 out tokens

With context: $0.3293 · 1m 35s · 14 turns · 325 in / 5,003 out tokens

Install with Tessl CLI

npx tessl i markusdowne/handoff-integrity-check@0.1.2
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents