Use when work needs to be handed off to another agent or another human. Produce a continuation-ready brief with the objective, completed work, assumptions, unresolved issues, and next action instead of a generic summary. Good triggers include "prepare a handoff", "make this resumable", and "summarize this for another agent".
92
100%
Does it follow best practices?
Impact
89%
1.41xAverage score across 8 eval scenarios
Passed
No known issues
{
"context": "The agent must produce a handoff for an in-progress, partially-applied deployment with known pitfalls. This scenario tests whether the agent explicitly includes warnings about actions that would cause harm and whether the next action section is specific enough for the receiver to act safely.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Warning against kubectl rollout restart",
"description": "The document explicitly warns the receiver NOT to run kubectl rollout restart, explaining it would undo progress.",
"max_score": 12
},
{
"name": "Warning about migration ordering",
"description": "The document warns that the migration must be run before enabling the feature flag (or before removing old replicas), referencing the prior 500-error incident or equivalent caution.",
"max_score": 12
},
{
"name": "Warnings are prominent",
"description": "The warnings appear as explicit warnings or cautions (e.g. labelled 'Warning:', listed in a warnings section, or in the Next Action section) — not buried inside a completed-steps list.",
"max_score": 10
},
{
"name": "Next Action section present",
"description": "The document has a Next Action section.",
"max_score": 6
},
{
"name": "Next action references migration",
"description": "The Next Action explicitly names running the database migration script as a required step before proceeding.",
"max_score": 10
},
{
"name": "Next action includes timing constraint",
"description": "The Next Action or Unresolved section notes the low-traffic window requirement for running the migration.",
"max_score": 8
},
{
"name": "All six sections present",
"description": "The document includes all six prescribed sections: Objective, Completed, Assumptions, Unresolved, Next Action, Critical References.",
"max_score": 8
},
{
"name": "Critical References section present",
"description": "The document has a Critical References section with at least one command, file path, or identifier.",
"max_score": 7
},
{
"name": "Completed section reflects partial state",
"description": "The Completed section accurately reflects that only 3 of 6 replicas have been updated — it does not claim the deployment is complete.",
"max_score": 10
},
{
"name": "Unresolved contains open items",
"description": "The Unresolved section lists at least the feature flag enablement and/or old replica removal as remaining open items.",
"max_score": 8
},
{
"name": "No narrative recap",
"description": "The document does not contain prose narrating the history of the deployment; information is structured under headings.",
"max_score": 9
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
compact-handoff