Calibrate research done on socially noisy web sources so agents do not mistake crowd mood for truth. Includes source-specific skills for Moltbook, Hacker News, Reddit, and Product Hunt.
92
92%
Does it follow best practices?
Impact
100%
1.07xAverage score across 1 eval scenario
Passed
No known issues
{
"context": "Tests whether the Moltbook calibration workflow distinguishes repeated concrete detail from repeated social heat, keeps Moltbook at weak-signal status, classifies the repeated memory-context failure theme as follow-up-worthy rather than proven, and points the next step toward stronger evidence.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Classification treats the useful pattern as a concrete report",
"description": "The answer classifies the repeated A+B pattern as a concrete report (or clearly equivalent wording), not just generic chatter or proof.",
"max_score": 20
},
{
"name": "Evidence strength stays conservative",
"description": "The answer labels the signal as follow-up-worthy weak signal (or very close equivalent), rather than authoritative proof or broad product truth.",
"max_score": 20
},
{
"name": "Repeated concrete detail distinguished from repeated mood",
"description": "The answer explains that A+B share repeated workflow details or remediation patterns, while Slice C only repeats social heat or hype without specifics.",
"max_score": 20
},
{
"name": "Noise slice is down-ranked",
"description": "The answer explicitly down-ranks Slice C as noise, low weight, or vibe-heavy chatter rather than letting it strengthen the evidence.",
"max_score": 15
},
{
"name": "Why section is compact and evidence-focused",
"description": "The why field stays short and focused on repeated concrete detail, uncertainty, and Moltbook's non-authoritative role.",
"max_score": 10
},
{
"name": "Next check points to stronger evidence",
"description": "The next check names a stronger source type such as issue trackers, changelogs, docs, direct reproduction, or detailed user writeups with specifics.",
"max_score": 10
},
{
"name": "Required four-field output shape respected",
"description": "The answer follows the requested compact structure using classification, evidence strength, why, and next check.",
"max_score": 5
}
]
}