Interactive discovery interview → structured product spec. Triggers: spec, PRD, requirements, scoping, brainstorming, new project. Uses AskUserQuestion; WebSearch/WebFetch when user wants research. Outputs user stories, acceptance criteria, constraints.
99
100%
Does it follow best practices?
Impact
96%
2.28xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly surfaces and resolves conflicting requirements using AskUserQuestion, covers multiple CATEGORIES.md categories, and does not write the spec until the completeness check passes.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Conflict surfaced explicitly",
"description": "The interview_log.md shows at least one question that explicitly names a tension between two competing requirements (e.g., real-time vs cost, or security vs frictionless login) rather than silently picking one side",
"max_score": 15
},
{
"name": "Conflict resolution options",
"description": "The conflict question(s) in the log include options that let the user favor one side, favor the other side, AND an option to explore alternatives — not just a binary choice",
"max_score": 10
},
{
"name": "Multiple categories covered",
"description": "The interview_log.md shows questions covering at least 4 distinct categories from CATEGORIES.md: Problem & Goals, UX/Journey, Data & State, Technical Landscape, Scale & Performance, Integrations, Security, Deployment",
"max_score": 10
},
{
"name": "Minimum question count",
"description": "The interview_log.md contains at least 10 labeled question exchanges",
"max_score": 10
},
{
"name": "Uncertainty option in every question",
"description": "Every question shown in the interview_log.md includes at least one answer option for uncertainty (e.g. 'I'm not sure', 'Not certain', 'Research this')",
"max_score": 15
},
{
"name": "Completeness check gate",
"description": "The interview_log.md shows a completeness check that verifies: problem statement, user journey, technical decisions, and tradeoffs — before generating the spec",
"max_score": 10
},
{
"name": "No silent TBDs in spec",
"description": "The final spec file does NOT contain unresolved 'TBD', '[TBD]', or 'To Be Determined' placeholders in the key sections (Problem, UX, Technical Architecture, Decisions)",
"max_score": 10
},
{
"name": "Spec file naming convention",
"description": "The spec file is placed in docs/specs/ and the filename starts with a date in YYYY-MM-DD format",
"max_score": 10
},
{
"name": "Spec uses template structure",
"description": "The spec file includes Functional Requirements broken into P0 (Must Have), P1 (Should Have), and P2 (Nice to Have) sub-sections",
"max_score": 10
}
]
}