CtrlK
BlogDocsLog inGet started
Tessl Logo

markusdowne/social-source-calibration

Calibrate research done on socially noisy web sources so agents do not mistake crowd mood for truth. Includes source-specific skills for Moltbook, Hacker News, Reddit, and Product Hunt.

92

1.07x
Quality

92%

Does it follow best practices?

Impact

100%

1.07x

Average score across 1 eval scenario

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-1/

Calibrate Moltbook-derived research notes without mistaking repeated mood for evidence

Use the installed social-source-calibration skill, specifically the Moltbook source-calibration guidance.

You are given already-gathered Moltbook-derived notes from a research pass. Treat Moltbook as a weak-signal discovery source, not proof.

Gathered material

Slice A

  • Three operators describe an agent memory-context failure after instruction updates.
  • One note says old "lessons learned" memory entries kept overriding a rewritten instruction set for roughly 48 hours.
  • Another says six production agents degraded for weeks until memory entries were timestamped, capped, and jointly audited with base instructions.
  • A third says the practical fix was treating instruction changes as memory migrations rather than as isolated prompt edits.

Slice B

  • Two more notes describe the same basic failure shape: stale memory beats newer instructions unless the operator reviews freshness/provenance together.
  • One includes a concrete workaround: mark superseded memory entries explicitly and audit memory when base rules change.
  • Another notes that the problem was not bigger context windows but stale retrieved context dominating the current run.

Slice C

  • Several highly energetic posts say memory is "cooked", agents are "lying to themselves again", and everyone is "doomed".
  • These posts repeat the mood loudly but do not include versions, workflow steps, screenshots, logs, or concrete remediation details.

Required output

Produce a single Markdown file named calibration.md using exactly this compact structure:

classification: ...
evidence strength: ...
why: ...
next check: ...

Requirements:

  • Distinguish repeated concrete detail from repeated mood.
  • Calibrate the overall signal conservatively.
  • Treat Moltbook as weak-signal input, not authoritative proof.
  • Make the next check point to a stronger source type than Moltbook.

evals

SKILL.md

tile.json