Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
100%
Does it follow best practices?
Impact
96%
1.50xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly handles the absence of a handoff artifact by classifying the situation as CRITICAL by default, rather than treating it as a softer OPERATIONAL issue, and produces structured output with appropriate escalation.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Artifact absence noted",
"description": "Output explicitly notes that no handoff file, handoff document, or handoff packet was found or is available",
"max_score": 10
},
{
"name": "CRITICAL classification",
"description": "The overall classification is explicitly CRITICAL (not OPERATIONAL, not CLEAN)",
"max_score": 20
},
{
"name": "Not classified as OPERATIONAL",
"description": "Output does NOT classify the situation as OPERATIONAL — missing artifact must be treated more severely",
"max_score": 12
},
{
"name": "Not classified as CLEAN",
"description": "Output does NOT classify the situation as CLEAN or 'safe to proceed'",
"max_score": 8
},
{
"name": "Escalation to task owner",
"description": "Escalation section recommends notifying, contacting, or involving the original task owner or a responsible party",
"max_score": 12
},
{
"name": "Recovery steps present",
"description": "Output includes recovery steps such as locating the original engineer, creating a new handoff, or establishing current state before proceeding",
"max_score": 12
},
{
"name": "Does not proceed without handoff",
"description": "Output does NOT recommend or attempt to start the audit work — it focuses on resolving the missing handoff first",
"max_score": 10
},
{
"name": "Per-check summary present",
"description": "Output includes a structured check summary section (even if it only shows the artifact check as failed)",
"max_score": 8
},
{
"name": "Explicit classification label",
"description": "Output uses the explicit label CRITICAL (case-insensitive) as the classification — not just described in prose",
"max_score": 8
}
]
}