Name: markusdowne/handoff-integrity-check
Rating: 0.968 (1 reviews)
Author: markusdowne

markusdowne/handoff-integrity-check

Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

1.50x

Quality

100%

Does it follow best practices?

Impact

96%

1.50x

Average score across 9 eval scenarios

{
  "context": "Tests whether the agent performs a replay test on the handoff packet, detects that the documented state is contradictory (completed says work is done; next_action says to start it), refuses to mark the handoff as successful, and classifies as OPERATIONAL.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Replay test attempted",
      "description": "Output shows that a replay test was performed — i.e., the agent attempted to answer questions about the current objective, any unresolved blockers, and the next immediate action from the packet",
      "max_score": 12
    },
    {
      "name": "Contradiction identified",
      "description": "Output explicitly identifies the contradiction: completed states work is done, but next_action says to begin the same work",
      "max_score": 18
    },
    {
      "name": "Replay test marked failed",
      "description": "The replay test result is marked as failed or inconclusive — not passed",
      "max_score": 12
    },
    {
      "name": "Not classified as CLEAN",
      "description": "The overall classification is NOT CLEAN — the handoff is not marked as safe to resume without action",
      "max_score": 12
    },
    {
      "name": "OPERATIONAL classification",
      "description": "The classification is specifically OPERATIONAL (reflecting that the packet exists but has integrity issues)",
      "max_score": 10
    },
    {
      "name": "Does not mark handoff successful",
      "description": "Output does NOT state the handoff is successful, valid, or safe to proceed — it requires resolution first",
      "max_score": 12
    },
    {
      "name": "Per-check summary",
      "description": "Output includes a per-check breakdown covering at least schema, freshness, token, and replay checks with pass/fail for each",
      "max_score": 8
    },
    {
      "name": "Replay failure in summary",
      "description": "The check summary explicitly shows the replay test as failed (not just passing it by omission)",
      "max_score": 8
    },
    {
      "name": "Escalation present",
      "description": "Output includes an escalation recommendation",
      "max_score": 8
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/handoff-integrity-check

markusdowne/handoff-integrity-check

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-6/

rubric.jsonevals/scenario-6/