CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/memory-roundtrip-guard

Tests memory writes, confirms read-back accuracy, and validates retrieval success to ensure saved information can actually be recovered. Use when you need to verify memory was saved correctly, check if stored data can be retrieved, confirm a memory entry is discoverable, or escalate when saved information appears lost or corrupted. Covers write confirmation, read-back comparison, retrieval smoke testing, and failure escalation. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.

92

1.19x

Quality

90%

Does it follow best practices?

Impact

97%

1.19x

Average score across 5 eval scenarios

Overview
Skills
Evals
Files

task.mdevals/scenario-5/

Memory Reliability Audit Tool

Problem/Feature Description

An AI-powered customer support platform stores conversation context in a memory layer so agents can pick up where previous sessions left off. Over the past week, the operations team has received complaints from support agents that context saved in one session is sometimes missing in the next — leaving agents without critical background on a customer's issue. The pattern is inconsistent: some entries seem fine, others appear to vanish, and a few have been reported missing repeatedly across multiple shifts.

The team lead has asked for an audit tool that runs a batch of memory write-verify cycles against a set of test entries and produces a report the team can review each morning. The report needs to distinguish between isolated blips and more serious recurring problems, and it must clearly surface anything that hasn't been resolved so it doesn't get lost. The team will use this report in their daily stand-up to decide whether an incident needs to be escalated to the infrastructure team.

Output Specification

Produce two files:

  1. memory_audit.py — A Python script that simulates multiple rounds of write-verify cycles for the test entries below. For each entry, perform write and verification steps. Some entries may be configured to simulate failures (e.g., by corrupting the stored value or simulating a retrieval miss). The script should print a JSON report to stdout.

  2. audit_report.json — The report produced by running the script. It must include per-entry results with a severity classification and a recommended action, plus an end-of-day summary at the end that the team can use to track what still needs to be resolved.

The script must be runnable with python memory_audit.py using only Python standard library modules. Simulate at least one passing entry, one single-failure entry, and one repeated-failure entry to exercise the full classification range.

Input Files

The following entries represent conversation context that should be stored and verified:

[
  {
    "key": "ctx_session_A1",
    "content": "Customer reported billing error on invoice #8821. Agreed to credit of $45. Follow up in 3 days.",
    "source": "support-agent",
    "simulate_failure": false
  },
  {
    "key": "ctx_session_B2",
    "content": "Customer wants to upgrade to enterprise plan. Sent pricing sheet. Awaiting sign-off from their procurement team.",
    "source": "sales-agent",
    "simulate_failure": "retrieval_miss"
  },
  {
    "key": "ctx_session_C3",
    "content": "Recurring complaint: login failures on mobile app. Third report from this customer this month. Escalate to mobile engineering.",
    "source": "support-agent",
    "simulate_failure": "repeated_mismatch"
  }
]

Install with Tessl CLI

npx tessl i markusdowne/memory-roundtrip-guard@0.1.2

evals

SKILL.md

tile.json