Name: markusdowne/handoff-integrity-check
Rating: 0.968 (1 reviews)
Author: markusdowne

markusdowne/handoff-integrity-check

Validate agent handoff packets and resume readiness using schema, freshness, and replay checks. Use when tasks pause/resume across sessions, agents, or humans — including when a user wants to continue where they left off, hand off to another agent, resume a previous task, or pick up an interrupted workflow. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

1.50x

Quality

100%

Does it follow best practices?

Impact

96%

1.50x

Average score across 9 eval scenarios

{
  "context": "Tests whether the agent implements handoff validation using the correct constants, patterns, and logic: 48-hour freshness threshold, timezone-aware datetime, specific token regex pattern, all 8 required fields, non-empty checks, replay test questions, and token consumption check.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "48-hour constant",
      "description": "Code defines or uses a constant of 48 (hours) as the maximum staleness threshold for freshness checking",
      "max_score": 10
    },
    {
      "name": "Timezone-aware datetime parsing",
      "description": "Code parses updated_at using datetime.fromisoformat (or equivalent) and replaces 'Z' with '+00:00' or otherwise handles UTC timezone correctly",
      "max_score": 10
    },
    {
      "name": "UTC now comparison",
      "description": "Code uses datetime.now(timezone.utc) or datetime.utcnow().replace(tzinfo=timezone.utc) (or equivalent) for the current time comparison — not a naive datetime",
      "max_score": 10
    },
    {
      "name": "Token regex pattern",
      "description": "Code uses a regex that matches alphanumeric characters, underscores, and hyphens with a minimum of 8 and maximum of 128 characters for token validation",
      "max_score": 12
    },
    {
      "name": "All 8 fields validated",
      "description": "Code explicitly checks for all 8 field names: objective, completed, unresolved, assumptions, next_action, risks, updated_at, resume_token",
      "max_score": 12
    },
    {
      "name": "Non-empty validation",
      "description": "Code checks that fields are non-empty (not just present) — distinguishes between a field existing and it having a meaningful value",
      "max_score": 10
    },
    {
      "name": "Replay test questions",
      "description": "Code or documentation includes the three replay test questions: current objective, unresolved blocker, and next immediate action",
      "max_score": 12
    },
    {
      "name": "Classification logic",
      "description": "Code implements classification with at least 3 distinct levels or outcomes (e.g., clean/operational/critical or pass/warn/fail)",
      "max_score": 10
    },
    {
      "name": "Consumed token check",
      "description": "Code checks whether the resume_token appears in a set or list of previously consumed/used tokens",
      "max_score": 12
    },
    {
      "name": "Demo runs cleanly",
      "description": "demo.py can be executed with Python (no import errors, no unhandled exceptions) — both sample packets are validated and results printed",
      "max_score": 2
    }
  ]
}

Install with Tessl CLI

npx tessl i markusdowne/handoff-integrity-check@0.1.2

markusdowne/handoff-integrity-check

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-7/

rubric.jsonevals/scenario-7/