Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent maintains strict phase separation across the specify→plan boundary: spec.md must be technology-agnostic (WHAT not HOW) while plan.md must contain only technical decisions (no governance). Also validates that every functional requirement in the spec traces to a plan decision and no phantom requirements appear in the plan.",
"type": "weighted_checklist",
"checklist": [
{
"name": "No technology in spec.md",
"description": "spec.md does NOT mention specific technologies, frameworks, databases, protocols, or languages (e.g., no Kafka, PostgreSQL, MQTT, TimescaleDB, Python, React, WebSocket, REST, GraphQL)",
"max_score": 15
},
{
"name": "No governance in plan.md",
"description": "plan.md does NOT restate constitutional principles (no 'at-least-once delivery', 'degrade gracefully', 'auditability' rules). It may reference the constitution but must not duplicate governance content",
"max_score": 12
},
{
"name": "FR-XXX requirements in spec",
"description": "spec.md contains at least 5 functional requirements using the FR-XXX pattern, covering: telemetry ingestion, health scoring, alerting, custom thresholds, and connection-lost detection",
"max_score": 8
},
{
"name": "SC-XXX success criteria in spec",
"description": "spec.md contains at least 3 measurable success criteria using the SC-XXX pattern with quantifiable elements (numbers, percentages, time measurements)",
"max_score": 6
},
{
"name": "User stories in spec",
"description": "spec.md contains at least 3 user stories covering dispatcher monitoring, maintenance coordinator investigation, and threshold configuration",
"max_score": 6
},
{
"name": "Given/When/Then scenarios in spec",
"description": "spec.md contains at least 4 acceptance scenarios in Given/When/Then format covering the key use cases from the PM description",
"max_score": 6
},
{
"name": "Plan references spec FRs",
"description": "plan.md or research.md references specific FR-XXX identifiers from spec.md when justifying technical decisions (e.g., choosing a time-series database to satisfy FR-XXX about telemetry storage)",
"max_score": 10
},
{
"name": "Every spec FR traceable to plan",
"description": "Every FR-XXX in spec.md has a corresponding technical decision, data model entity, or API contract in the plan artifacts. No orphan requirements that the plan ignores",
"max_score": 12
},
{
"name": "No phantom requirements in plan",
"description": "plan.md does not introduce features or capabilities not described in spec.md (e.g., no route optimization, driver scoring, fuel tracking, or other out-of-scope features that the PM did not request)",
"max_score": 12
},
{
"name": "data-model.md traces to spec entities",
"description": "data-model.md defines entities that correspond to concepts in the spec (vehicles, telemetry readings, health scores, alerts, thresholds) — not entities invented by the plan without spec basis",
"max_score": 8
},
{
"name": "Connection-lost requirement survives to plan",
"description": "The PM's specific requirement about 5-minute data absence showing 'connection lost' status appears in spec.md as a formal requirement AND is addressed in the plan's technical design",
"max_score": 5
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates