Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
100%
Does it follow best practices?
Impact
96%
1.50xAverage score across 9 eval scenarios
handoff validation and replay readiness
Required fields
100%
100%
Freshness and token checks
100%
100%
Replay test
100%
100%
Classification mapping
100%
100%
Example outputs
100%
100%
Without context: $0.2221 · 1m 14s · 11 turns · 18 in / 3,572 out tokens
With context: $0.4120 · 1m 37s · 21 turns · 336 in / 5,248 out tokens
Schema validation and output format
All 8 fields checked
60%
100%
Empty next_action flagged
100%
100%
Empty assumptions flagged
100%
100%
Per-check pass/fail summary
33%
100%
Non-clean classification
100%
100%
Explicit classification label
0%
100%
Recovery steps listed
100%
100%
Escalation recommendation
80%
100%
Freshness check performed
87%
100%
Without context: $0.2223 · 1m 17s · 9 turns · 14 in / 3,399 out tokens
With context: $0.3062 · 1m 34s · 12 turns · 477 in / 4,948 out tokens
Freshness check with 48-hour threshold
Freshness check performed
100%
100%
Age quantified
100%
100%
Freshness marked as failed
73%
100%
48-hour threshold referenced
0%
100%
OPERATIONAL classification
0%
100%
Recovery includes timestamp update
100%
100%
Per-check summary
50%
100%
Escalation present
0%
100%
Without context: $0.1592 · 54s · 9 turns · 14 in / 2,453 out tokens
With context: $0.3621 · 1m 46s · 15 turns · 483 in / 5,474 out tokens
Resume token format validation
Token check performed
100%
100%
Packet A token rejected
100%
100%
Packet B token rejected
100%
100%
Format requirements stated
66%
100%
Non-clean classification for both
66%
100%
Recovery includes new token
100%
100%
Per-check summary both packets
100%
100%
Escalation for both packets
66%
100%
Without context: $0.2280 · 1m 23s · 8 turns · 13 in / 4,178 out tokens
With context: $0.5150 · 2m 21s · 22 turns · 368 in / 7,912 out tokens
Missing artifact critical classification
Artifact absence noted
100%
100%
CRITICAL classification
25%
100%
Not classified as OPERATIONAL
100%
100%
Not classified as CLEAN
100%
100%
Escalation to task owner
100%
100%
Recovery steps present
100%
100%
Does not proceed without handoff
100%
100%
Per-check summary present
75%
100%
Explicit classification label
25%
100%
Without context: $0.2450 · 1m 24s · 15 turns · 20 in / 3,670 out tokens
With context: $0.4368 · 1m 30s · 18 turns · 436 in / 4,278 out tokens
Replay test failure guardrail
Replay test attempted
58%
100%
Contradiction identified
100%
100%
Replay test marked failed
41%
100%
Not classified as CLEAN
100%
100%
OPERATIONAL classification
0%
100%
Does not mark handoff successful
100%
100%
Per-check summary
0%
100%
Replay failure in summary
0%
100%
Escalation present
50%
100%
Without context: $0.1457 · 53s · 8 turns · 13 in / 2,408 out tokens
With context: $0.4680 · 2m 12s · 19 turns · 329 in / 7,778 out tokens
Python validator implementation
48-hour constant
0%
100%
Timezone-aware datetime parsing
100%
100%
UTC now comparison
100%
100%
Token regex pattern
0%
100%
All 8 fields validated
100%
100%
Non-empty validation
70%
80%
Replay test questions
0%
75%
Classification logic
20%
70%
Consumed token check
0%
100%
Demo runs cleanly
100%
100%
Without context: $0.2450 · 1m 17s · 14 turns · 18 in / 4,817 out tokens
With context: $0.6546 · 2m 45s · 30 turns · 341 in / 9,306 out tokens
Multi-packet classification audit
Alpha classified CLEAN
0%
0%
Beta classified OPERATIONAL
0%
100%
Gamma classified OPERATIONAL
0%
0%
Per-check breakdown for all three
0%
100%
Recovery steps for Beta and Gamma
58%
66%
Escalation for all three
30%
100%
CLEAN/OPERATIONAL/CRITICAL labels used
0%
100%
Freshness check on all packets
66%
100%
Summary table or section
100%
100%
Without context: $0.1735 · 1m 9s · 8 turns · 13 in / 3,145 out tokens
With context: $0.4463 · 2m 16s · 18 turns · 328 in / 7,452 out tokens
Uncertainty operational classification
Vagueness identified
100%
100%
Replay test attempted
70%
100%
Replay test not confirmed
73%
100%
Not classified as CLEAN
100%
100%
OPERATIONAL classification
0%
100%
Recovery steps present
100%
100%
Escalation recommendation
75%
100%
Per-check summary
25%
100%
Schema check passes
0%
100%
Does not recommend immediate resumption
100%
100%
Without context: $0.1679 · 57s · 8 turns · 13 in / 2,370 out tokens
With context: $0.3293 · 1m 35s · 14 turns · 325 in / 5,003 out tokens
Install with Tessl CLI
npx tessl i markusdowne/handoff-integrity-check@0.1.2Table of Contents