CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/handoff-integrity-check

Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

96

1.50x

Quality

100%

Does it follow best practices?

Impact

96%

1.50x

Average score across 9 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-2/

{
  "context": "Tests whether the agent validates all required handoff packet fields for presence and non-emptiness, and produces structured output including a per-check summary, classification, recovery steps, and escalation recommendation.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "All 8 fields checked",
      "description": "The review explicitly references all 8 required field names: objective, completed, unresolved, assumptions, next_action, risks, updated_at, resume_token",
      "max_score": 10
    },
    {
      "name": "Empty next_action flagged",
      "description": "The review identifies next_action as empty/blank and marks it as a failure or problem",
      "max_score": 12
    },
    {
      "name": "Empty assumptions flagged",
      "description": "The review identifies assumptions as empty (empty array) and marks it as a failure or problem",
      "max_score": 12
    },
    {
      "name": "Per-check pass/fail summary",
      "description": "Output includes a section that lists each check (schema, freshness, token, replay) with an explicit pass or fail result for each",
      "max_score": 12
    },
    {
      "name": "Non-clean classification",
      "description": "The overall status/classification is NOT 'clean' or 'safe to resume without action' — some issue is surfaced",
      "max_score": 12
    },
    {
      "name": "Explicit classification label",
      "description": "Output explicitly uses one of the labels: CLEAN, OPERATIONAL, or CRITICAL (case-insensitive) as the classification",
      "max_score": 10
    },
    {
      "name": "Recovery steps listed",
      "description": "Output includes a Recovery Steps section with at least one specific action item addressing the identified problems",
      "max_score": 14
    },
    {
      "name": "Escalation recommendation",
      "description": "Output includes an Escalation section with a recommendation (e.g., notify task owner, hold the handoff, or similar)",
      "max_score": 10
    },
    {
      "name": "Freshness check performed",
      "description": "Output includes result of a freshness or timestamp check on updated_at (pass or fail with reasoning)",
      "max_score": 8
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/handoff-integrity-check

evals

SKILL.md

tile.json