Spec-driven workflow covering requirement gathering, spec authoring, implementation review, and verification — with skills, rules, and evaluation scenarios.
96
90%
Does it follow best practices?
Impact
98%
1.19xAverage score across 9 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent follows the requirement-gathering skill when preparing for a stakeholder interview: reviewing existing specs to identify what's already covered, identifying genuine gaps, formulating one-at-a-time questions based on those gaps, and producing a structured requirements summary.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Output file created",
"description": "A file named requirements-prep.md is produced",
"max_score": 4
},
{
"name": "Existing coverage section",
"description": "requirements-prep.md contains a section summarizing what the existing specs (user-management.spec.md, auth.spec.md) already document that is relevant to the export feature",
"max_score": 10
},
{
"name": "Cites list_users as existing",
"description": "The coverage summary specifically notes that list_users with role/status filtering already exists in the spec",
"max_score": 8
},
{
"name": "Gap analysis present",
"description": "requirements-prep.md contains a section identifying specific ambiguous or undocumented areas for the export feature (not just generic categories)",
"max_score": 10
},
{
"name": "Questions are separate items",
"description": "Interview questions are listed as individual numbered items — NOT bundled together as a multi-part question or a comma-separated list",
"max_score": 12
},
{
"name": "No redundant questions",
"description": "Questions do NOT ask about things already documented in the existing specs (e.g. does NOT ask what roles exist, does NOT ask about pagination behavior already defined)",
"max_score": 12
},
{
"name": "Performance/scale addressed",
"description": "Gap analysis or questions address the large-org scale requirement (50k+ users) — e.g. async job vs. synchronous, streaming, timeouts",
"max_score": 10
},
{
"name": "Preliminary scope section",
"description": "requirements-prep.md contains a scope section distinguishing what is likely in scope versus out of scope",
"max_score": 8
},
{
"name": "Non-functional gaps identified",
"description": "Gap analysis or questions address at least one non-functional requirement (performance, security, file size limits, rate limiting, or similar)",
"max_score": 8
},
{
"name": "Gap-driven questions",
"description": "Each question clearly addresses a gap identified in the gap analysis — not generic process questions or things answerable from existing specs",
"max_score": 8
},
{
"name": "Questions are specific and bounded",
"description": "Questions are specific and bounded (e.g. 'Should the export be synchronous or generated as a background job?' not 'How should the export work?')",
"max_score": 10
}
]
}