Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
100%
Does it follow best practices?
Impact
96%
1.50xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly performs a freshness check on the handoff packet using a 48-hour threshold, reports the check result with age information, classifies the result as OPERATIONAL, and provides appropriate recovery steps.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Freshness check performed",
"description": "Output includes an explicit freshness or staleness check on the updated_at timestamp",
"max_score": 10
},
{
"name": "Age quantified",
"description": "The assessment reports the age of the handoff in hours, days, or a comparable time unit (not just 'stale')",
"max_score": 12
},
{
"name": "Freshness marked as failed",
"description": "The freshness check result is explicitly marked as failed, not passing, or otherwise indicates the packet is too old",
"max_score": 15
},
{
"name": "48-hour threshold referenced",
"description": "The assessment mentions a 48-hour (or 2-day) threshold as the basis for the freshness decision",
"max_score": 15
},
{
"name": "OPERATIONAL classification",
"description": "The overall classification is OPERATIONAL (not CLEAN, and not CRITICAL)",
"max_score": 15
},
{
"name": "Recovery includes timestamp update",
"description": "Recovery steps include an action to update or refresh the updated_at timestamp or re-validate current state",
"max_score": 13
},
{
"name": "Per-check summary",
"description": "Output includes a summary section listing results for each check performed (schema, freshness, etc.) with pass/fail indicators",
"max_score": 10
},
{
"name": "Escalation present",
"description": "Output includes an escalation recommendation section",
"max_score": 10
}
]
}