Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent writes a feature spec that is technology-agnostic (WHAT not HOW), uses FR-XXX and SC-XXX numbered requirements, includes Given/When/Then acceptance scenarios, uses a 2-4 word action-noun branch name, and avoids leaving many [NEEDS CLARIFICATION] placeholders by making reasonable assumptions.",
"type": "weighted_checklist",
"checklist": [
{
"name": "No technology stack in spec",
"description": "spec.md does NOT mention specific technologies, frameworks, databases, languages, or architectural patterns (e.g., no mention of REST, GraphQL, WebSocket, PostgreSQL, Redis, React, microservices)",
"max_score": 15
},
{
"name": "FR-XXX numbered requirements",
"description": "spec.md contains at least 4 functional requirements numbered with the FR-XXX pattern (e.g., FR-001, FR-002)",
"max_score": 10
},
{
"name": "SC-XXX success criteria",
"description": "spec.md contains at least 2 success criteria numbered with the SC-XXX pattern (e.g., SC-001, SC-002)",
"max_score": 8
},
{
"name": "Given/When/Then scenarios",
"description": "spec.md contains at least 4 acceptance scenarios in Given/When/Then format covering the sharing permission use cases",
"max_score": 10
},
{
"name": "User stories present",
"description": "spec.md contains explicitly labeled user stories (US-1, US-2, or similar) with role-based framing ('As a [role], I want to...')",
"max_score": 8
},
{
"name": "Measurable success criteria",
"description": "At least one SC-XXX success criterion includes a measurable/quantifiable element (a number, percentage, time measurement, or explicit condition)",
"max_score": 8
},
{
"name": "Max 3 NEEDS CLARIFICATION",
"description": "spec.md contains at most 3 [NEEDS CLARIFICATION] markers (agent makes reasonable assumptions rather than leaving many unresolved questions)",
"max_score": 10
},
{
"name": "No implementation details",
"description": "spec.md does NOT describe HOW the system will work internally (no database schemas, API endpoints, service names, file structures, or deployment configurations)",
"max_score": 12
},
{
"name": "2-4 word branch name",
"description": "spec-report.md mentions a feature branch name that is 2-4 hyphenated words in action-noun format (e.g., 'doc-sharing', 'document-permissions', 'share-document-access')",
"max_score": 10
},
{
"name": "Requirements.md checklist created",
"description": "specs/004-doc-sharing/checklists/requirements.md exists and contains checklist items that evaluate requirement quality (not implementation correctness)",
"max_score": 9
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates