Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent generates tasks.md following IIKit's strict task format: sequential T-prefixed IDs, [P] markers only for parallelizable tasks, [USn] labels only for user story tasks, comma-separated TS-XXX references (not prose ranges), and the correct phase structure (Setup, Foundational, User Stories, Polish).",
"type": "weighted_checklist",
"checklist": [
{
"name": "Sequential T-prefixed IDs",
"description": "All tasks have sequential IDs in the format T001, T002, T003... (three-digit zero-padded numbers with T prefix)",
"max_score": 12
},
{
"name": "[P] marker usage",
"description": "Tasks that work on different files with no mutual dependencies are marked [P], and tasks with dependencies or shared file access are NOT marked [P]",
"max_score": 10
},
{
"name": "[USn] label on story tasks",
"description": "Tasks that implement user story functionality include a [US1], [US2], or [US3] label, while Setup and Foundational tasks do NOT include a [USn] label",
"max_score": 10
},
{
"name": "Comma-separated TS references",
"description": "When multiple test spec IDs are referenced, they are listed as a comma-separated list (e.g., [TS-001, TS-002]) NOT as prose ranges like 'TS-001 through TS-002'",
"max_score": 15
},
{
"name": "Phase 1 Setup section",
"description": "tasks.md contains a Phase 1 (or equivalent) section for project initialization/setup tasks (e.g., project structure, pyproject.toml, database setup)",
"max_score": 8
},
{
"name": "Phase 2 Foundational section",
"description": "tasks.md contains a Phase 2 (or equivalent) Foundational section for shared prerequisites that must complete before user story tasks (e.g., base models, database connection)",
"max_score": 8
},
{
"name": "User Story phases ordered by priority",
"description": "US-1 (P1) and US-2 (P1) tasks appear in earlier phases than US-3 (P2) tasks, reflecting priority ordering from the spec",
"max_score": 8
},
{
"name": "File paths in descriptions",
"description": "Task descriptions include specific file paths (e.g., 'src/models/item.py', 'tests/step_definitions/') rather than vague descriptions without paths",
"max_score": 10
},
{
"name": "Checkbox format",
"description": "All tasks use the markdown checkbox format `- [ ]` (with space inside the brackets)",
"max_score": 9
},
{
"name": "Polish/Final phase",
"description": "tasks.md contains a final Polish or Cross-Cutting Concerns phase for integration testing, documentation, or cleanup tasks",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates