Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent generates Gherkin .feature files that follow IIKit's mandatory tagging conventions (@TS-XXX, @FR-XXX, @SC-XXX, @US-XXX, priority, test type), covers the majority of success criteria, and includes the required DO NOT MODIFY header on each file.",
"type": "weighted_checklist",
"checklist": [
{
"name": "TS-XXX tags present",
"description": "Every Scenario has a @TS-XXX tag (sequential numeric ID, e.g. @TS-001, @TS-002) on the line immediately before the Scenario keyword",
"max_score": 10
},
{
"name": "FR-XXX tags present",
"description": "Every Scenario has at least one @FR-XXX tag referencing a functional requirement from spec.md",
"max_score": 10
},
{
"name": "SC-XXX tags present",
"description": "At least some scenarios include @SC-XXX tags referencing success criteria from spec.md",
"max_score": 8
},
{
"name": "SC-XXX majority coverage",
"description": "At least 3 of the 5 SC-XXX success criteria (SC-001 through SC-005) have at least one scenario tagged with them across the generated .feature files. System-level criteria that cannot be expressed as Gherkin scenarios may be excluded.",
"max_score": 10
},
{
"name": "US-XXX tags present",
"description": "Each Scenario has a @US-XXX tag referencing the parent user story",
"max_score": 8
},
{
"name": "Priority tags present",
"description": "Every Scenario includes a priority tag (@P1, @P2, or @P3) matching the priority of its parent user story",
"max_score": 8
},
{
"name": "Test type tags present",
"description": "Every Scenario includes a test-type tag (one of @acceptance, @contract, or @validation)",
"max_score": 8
},
{
"name": "DO NOT MODIFY header",
"description": "Each .feature file starts with lines containing 'DO NOT MODIFY SCENARIOS' and instructions to write step definitions and not modify .feature files",
"max_score": 12
},
{
"name": "Feature-level US tag",
"description": "Feature-level lines include @US-XXX tags for the parent user story",
"max_score": 8
},
{
"name": "All acceptance scenarios covered",
"description": "All 8 Given/When/Then acceptance scenarios from the spec.md are represented as Gherkin Scenarios in the .feature files",
"max_score": 10
},
{
"name": "TS-XXX uniqueness",
"description": "All @TS-XXX tags are unique across all .feature files (no two scenarios share the same TS-XXX number)",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates