Name: markusdowne/social-source-calibration
Rating: 92.3913043478261 (1 reviews)
Author: markusdowne

markusdowne/social-source-calibration

Calibrate research done on socially noisy web sources so agents do not mistake crowd mood for truth. Includes source-specific skills for Moltbook, Hacker News, Reddit, and Product Hunt.

1.07x

Quality

92%

Does it follow best practices?

Impact

100%

1.07x

Average score across 1 eval scenario

Securityby

Passed

No known issues

{
  "context": "Tests whether the Moltbook calibration workflow distinguishes repeated concrete detail from repeated social heat, keeps Moltbook at weak-signal status, classifies the repeated memory-context failure theme as follow-up-worthy rather than proven, and points the next step toward stronger evidence.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Classification treats the useful pattern as a concrete report",
      "description": "The answer classifies the repeated A+B pattern as a concrete report (or clearly equivalent wording), not just generic chatter or proof.",
      "max_score": 20
    },
    {
      "name": "Evidence strength stays conservative",
      "description": "The answer labels the signal as follow-up-worthy weak signal (or very close equivalent), rather than authoritative proof or broad product truth.",
      "max_score": 20
    },
    {
      "name": "Repeated concrete detail distinguished from repeated mood",
      "description": "The answer explains that A+B share repeated workflow details or remediation patterns, while Slice C only repeats social heat or hype without specifics.",
      "max_score": 20
    },
    {
      "name": "Noise slice is down-ranked",
      "description": "The answer explicitly down-ranks Slice C as noise, low weight, or vibe-heavy chatter rather than letting it strengthen the evidence.",
      "max_score": 15
    },
    {
      "name": "Why section is compact and evidence-focused",
      "description": "The why field stays short and focused on repeated concrete detail, uncertainty, and Moltbook's non-authoritative role.",
      "max_score": 10
    },
    {
      "name": "Next check points to stronger evidence",
      "description": "The next check names a stronger source type such as issue trackers, changelogs, docs, direct reproduction, or detailed user writeups with specifics.",
      "max_score": 10
    },
    {
      "name": "Required four-field output shape respected",
      "description": "The answer follows the requested compact structure using classification, evidence strength, why, and next check.",
      "max_score": 5
    }
  ]
}

docs

evals

scenario-1

skills

markusdowne/social-source-calibration

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

criteria.jsonevals/scenario-1/