Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
100%
Does it follow best practices?
Impact
96%
1.50xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent validates all required handoff packet fields for presence and non-emptiness, and produces structured output including a per-check summary, classification, recovery steps, and escalation recommendation.",
"type": "weighted_checklist",
"checklist": [
{
"name": "All 8 fields checked",
"description": "The review explicitly references all 8 required field names: objective, completed, unresolved, assumptions, next_action, risks, updated_at, resume_token",
"max_score": 10
},
{
"name": "Empty next_action flagged",
"description": "The review identifies next_action as empty/blank and marks it as a failure or problem",
"max_score": 12
},
{
"name": "Empty assumptions flagged",
"description": "The review identifies assumptions as empty (empty array) and marks it as a failure or problem",
"max_score": 12
},
{
"name": "Per-check pass/fail summary",
"description": "Output includes a section that lists each check (schema, freshness, token, replay) with an explicit pass or fail result for each",
"max_score": 12
},
{
"name": "Non-clean classification",
"description": "The overall status/classification is NOT 'clean' or 'safe to resume without action' — some issue is surfaced",
"max_score": 12
},
{
"name": "Explicit classification label",
"description": "Output explicitly uses one of the labels: CLEAN, OPERATIONAL, or CRITICAL (case-insensitive) as the classification",
"max_score": 10
},
{
"name": "Recovery steps listed",
"description": "Output includes a Recovery Steps section with at least one specific action item addressing the identified problems",
"max_score": 14
},
{
"name": "Escalation recommendation",
"description": "Output includes an Escalation section with a recommendation (e.g., notify task owner, hold the handoff, or similar)",
"max_score": 10
},
{
"name": "Freshness check performed",
"description": "Output includes result of a freshness or timestamp check on updated_at (pass or fail with reasoning)",
"max_score": 8
}
]
}