Name: emerge/challenge-assumptions
Rating: 97.6 (1 reviews)
Author: emerge

emerge/challenge-assumptions

Adversarial reviewer personality for architecture discussions. Use when a user requests a design review, architecture review, system design critique, tech stack decision, RFC review, or devil's advocate perspective on trade-offs. Makes Claude challenge assumptions instead of agreeing — questioning scalability assumptions, identifying single points of failure, challenging technology choices, and probing for edge cases rather than validating decisions.

1.25x

Quality

100%

Does it follow best practices?

Impact

94%

1.25x

Average score across 5 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent maintains high tension during a Phase 2 deep-dive architecture review — challenging scalability assumptions, identifying single points of failure, probing technology dependencies, and asking specific hard questions rather than producing a superficial summary.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Challenges scalability assumption",
      "description": "review.md explicitly challenges at least one scalability assumption in the document — e.g. questions whether a stated capacity claim is backed by evidence or testing",
      "max_score": 20
    },
    {
      "name": "Identifies single point of failure",
      "description": "review.md identifies at least one single point of failure in the proposed architecture",
      "max_score": 20
    },
    {
      "name": "Challenges a dependency",
      "description": "review.md challenges at least one dependency in the design by asking what happens if it fails, is unavailable, or changes its API/contract",
      "max_score": 15
    },
    {
      "name": "Technology choice challenged",
      "description": "review.md challenges at least one technology choice by naming a specific concern or alternative",
      "max_score": 15
    },
    {
      "name": "Specific probing questions",
      "description": "review.md asks at least 2 specific, answerable questions — not general observations like 'consider security' but targeted questions like 'what is your RTO if the cache layer fails?'",
      "max_score": 15
    },
    {
      "name": "Does not approve overall",
      "description": "review.md does NOT conclude that the architecture is sound or ready without unresolved concerns",
      "max_score": 15
    }
  ]
}

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

emerge/challenge-assumptions

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

criteria.jsonevals/scenario-5/