Use when work needs to be handed off to another agent or another human. Produce a continuation-ready brief with the objective, completed work, assumptions, unresolved issues, and next action instead of a generic summary. Good triggers include "prepare a handoff", "make this resumable", and "summarize this for another agent".
92
100%
Does it follow best practices?
Impact
89%
1.41xAverage score across 8 eval scenarios
Passed
No known issues
{
"context": "The agent must produce a handoff for a live incident investigation. This scenario specifically tests whether the agent surfaces assumptions explicitly (rather than burying them in prose) and whether it includes warnings about known dead ends to prevent the receiver from repeating mistakes.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Assumptions section present",
"description": "The document has a section explicitly labelled 'Assumptions' (or equivalent heading).",
"max_score": 8
},
{
"name": "Assumptions are non-trivial",
"description": "The Assumptions section contains at least one concrete, decision-relevant assumption (e.g. a hypothesis about root cause, a condition treated as true without confirmation).",
"max_score": 10
},
{
"name": "Warning about restart dead end",
"description": "The document explicitly warns the receiver that restarting the Redis client does not resolve the issue.",
"max_score": 12
},
{
"name": "Warning is distinct from tasks",
"description": "The warning about the dead end appears as a warning/caution, not just buried in a list of completed steps.",
"max_score": 8
},
{
"name": "Unresolved section present",
"description": "The document has a section labelled 'Unresolved' (or equivalent) listing open questions.",
"max_score": 7
},
{
"name": "Unresolved contains open questions",
"description": "The Unresolved section includes at least one genuine question (e.g. whether Redis lag is root cause or symptom).",
"max_score": 10
},
{
"name": "Next Action section present",
"description": "The document has a section labelled 'Next Action' (or equivalent).",
"max_score": 7
},
{
"name": "Next action is specific",
"description": "The Next Action names a specific node, command, or version to act on — not just 'continue investigating'.",
"max_score": 10
},
{
"name": "Critical References present",
"description": "The document has a Critical References section with at least one concrete reference (node name, log path, or command).",
"max_score": 8
},
{
"name": "No narrative recap",
"description": "The document does NOT retell the investigation as a story; it uses bullet points or short statements under section headings.",
"max_score": 10
},
{
"name": "All six sections present",
"description": "The document includes all six prescribed sections: Objective, Completed, Assumptions, Unresolved, Next Action, Critical References.",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
compact-handoff