Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent generates tasks that are fully traceable to the plan and spec: file paths match the plan's project structure, user story tags match spec stories, TS-XXX references are comma-separated (not ranges), and no tasks reference technologies or files not defined in the plan.",
"type": "weighted_checklist",
"checklist": [
{
"name": "File paths match plan structure",
"description": "Task descriptions reference file paths that exist in the plan's Project Structure section (e.g., src/services/event-service.ts, src/routes/events.ts, prisma/schema.prisma). No invented paths like src/controllers/ or src/utils/ that aren't in the plan",
"max_score": 15
},
{
"name": "Every user story has tagged tasks",
"description": "All three user stories (US-1, US-2, US-3) have at least one task tagged with [US1], [US2], [US3] respectively",
"max_score": 10
},
{
"name": "Setup/Foundational tasks have no story tags",
"description": "Tasks in Setup and Foundational phases do NOT have [USn] labels — these are shared infrastructure, not story-specific",
"max_score": 8
},
{
"name": "TS references are comma-separated",
"description": "When tasks reference multiple test specs, they use comma-separated format like [TS-003, TS-004, TS-005] — NOT prose ranges like 'TS-003 through TS-005' or 'TS-003 to TS-005'",
"max_score": 12
},
{
"name": "TS references match provided .feature files",
"description": "TS-XXX references in tasks correspond to actual @TS-XXX tags in the provided .feature files (TS-001 through TS-007). No references to TS-008 or higher that don't exist",
"max_score": 10
},
{
"name": "Priority ordering respected",
"description": "P1 user story tasks (US-1 Create Event, US-2 Purchase Ticket) appear before P2 tasks (US-3 View Listings) in the phase structure",
"max_score": 8
},
{
"name": "Phase structure complete",
"description": "tasks.md has all required phases: Setup (project init, schema), Foundational (shared models, database), User Story phases (by priority), and a Polish/Final phase",
"max_score": 8
},
{
"name": "[P] markers only on parallelizable tasks",
"description": "[P] markers appear only on tasks that can genuinely run in parallel (different files, no mutual dependencies). Tasks that depend on each other (e.g., model before service) are NOT marked [P]",
"max_score": 8
},
{
"name": "No technologies beyond the plan",
"description": "Tasks do not introduce frameworks, libraries, or tools not mentioned in the plan (e.g., no Redis, no GraphQL, no React — the plan specifies Express, Prisma, Vitest, Resend)",
"max_score": 10
},
{
"name": "Checkbox format used",
"description": "All tasks use the markdown checkbox format: - [ ] TNNN description",
"max_score": 5
},
{
"name": "Sequential T-prefixed IDs",
"description": "Tasks use sequential zero-padded IDs: T001, T002, T003, etc. with no gaps or duplicates",
"max_score": 6
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates