CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/handoff-integrity-check

Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

100

1.31x
Quality

100%

Does it follow best practices?

Impact

100%

1.31x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent correctly validates a well-formed handoff packet, applies the freshness threshold correctly, validates the resume_token as a plain continuity marker, runs a replay check, and produces the required output format with the correct classification.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "All required fields checked",
      "description": "The validation report confirms that all 8 required fields (objective, completed, unresolved, assumptions, next_action, risks, updated_at, resume_token) are present and non-empty",
      "max_score": 10
    },
    {
      "name": "Freshness threshold 48h",
      "description": "The freshness check uses 48 hours as the staleness threshold (not 24h, 72h, or any other value)",
      "max_score": 10
    },
    {
      "name": "Freshness pass reported",
      "description": "The output explicitly states the handoff passed the freshness check (e.g. reports the hours since update and confirms it is within the 48 h limit)",
      "max_score": 8
    },
    {
      "name": "Resume token as continuity marker",
      "description": "The validation report describes the resume_token as a plain continuity ID or continuity marker — NOT as an authentication token, API key, session token, or credential",
      "max_score": 10
    },
    {
      "name": "Resume token validation pass",
      "description": "The output confirms that the resume_token is present, non-empty, and looks like a stable handoff ID (not a secret or signed token)",
      "max_score": 8
    },
    {
      "name": "Replay check performed",
      "description": "The output includes a replay check confirming at least two of: (a) objective matches the task, (b) unresolved items and risks make sense, (c) next_action is specific enough to execute",
      "max_score": 10
    },
    {
      "name": "Check summary format",
      "description": "The output contains a check summary section that lists pass/fail status per individual check (schema, freshness, resume token, replay)",
      "max_score": 10
    },
    {
      "name": "CLEAN classification",
      "description": "The output explicitly classifies the handoff as CLEAN (not operational or critical)",
      "max_score": 12
    },
    {
      "name": "Recovery steps section",
      "description": "The output includes a recovery steps section (can state 'None required' or equivalent for a clean result)",
      "max_score": 8
    },
    {
      "name": "Escalation recommendation",
      "description": "The output includes an escalation recommendation section (can state no escalation needed for a clean result)",
      "max_score": 8
    },
    {
      "name": "No credential misinterpretation",
      "description": "The output does NOT treat the resume_token value as an authentication credential, API key, or security token",
      "max_score": 6
    }
  ]
}

evals

scenario-1

criteria.json

task.md

SKILL.md

tile.json