Closing the intent-to-code chasm - specification-driven development with BDD verification chain
86
92%
Does it follow best practices?
Impact
86%
1.82xAverage score across 14 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent respects a constitutional offline-first constraint through spec and plan phases while producing properly structured artifacts. The PM explicitly pushes for specific network-dependent services (Google Vision, Firebase, SendGrid, exchange rate API), but the constitution mandates core operations work offline.",
"type": "weighted_checklist",
"checklist": [
{
"name": "No external API in core data path",
"description": "plan.md does NOT place external API calls (Google Vision, currency APIs, Firebase, SendGrid) in the critical path of expense submission or approval. These may appear as async/optional enhancements only",
"max_score": 12
},
{
"name": "PM's specific service suggestions not adopted blindly",
"description": "The PM explicitly suggested Google Vision API, Firebase Cloud Messaging, and SendGrid. plan.md does NOT adopt these as core dependencies — it either rejects them, defers them to online-only mode, or proposes offline alternatives",
"max_score": 10
},
{
"name": "Expense submission works offline in spec",
"description": "spec.md explicitly states or implies that expense submission (photo, category, amount) works without network connectivity",
"max_score": 8
},
{
"name": "Currency conversion handled offline",
"description": "plan.md addresses the currency requirement without requiring real-time API calls — e.g., cached exchange rates, manual entry, or periodic sync of rates when online",
"max_score": 8
},
{
"name": "Local storage as primary in plan",
"description": "plan.md designs local/on-device storage as the primary data store (SQLite, Realm, or similar)",
"max_score": 7
},
{
"name": "Sync designed as optional enhancement",
"description": "plan.md treats data synchronization as a background process that runs when connectivity is available — not as a prerequisite for any core operation",
"max_score": 7
},
{
"name": "Conflict resolution addressed",
"description": "plan.md or data-model.md addresses what happens when offline edits conflict with server state during sync",
"max_score": 6
},
{
"name": "Spec uses numbered requirements",
"description": "spec.md uses numbered functional requirements (FR-001, FR-002, etc.) and success criteria (SC-001, etc.) rather than unstructured prose",
"max_score": 8
},
{
"name": "Spec has acceptance scenarios",
"description": "spec.md includes Given/When/Then acceptance scenarios for the core user stories",
"max_score": 8
},
{
"name": "Spec is technology-agnostic",
"description": "spec.md describes WHAT the system does without prescribing specific technologies — no Google Vision, Firebase, SQLite, or other implementation choices in the specification",
"max_score": 8
},
{
"name": "Plan references spec requirements",
"description": "plan.md or research.md references specific FR-XXX identifiers from the spec when justifying technical decisions",
"max_score": 7
},
{
"name": "No governance restated in plan",
"description": "plan.md references the constitution but does NOT copy/paste the constitutional principles into the plan",
"max_score": 5
},
{
"name": "Spec does not promise network-dependent core features",
"description": "spec.md does NOT describe receipt OCR, currency conversion, or push notifications as guaranteed core features. They may appear as optional enhancements",
"max_score": 6
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
rules
skills
iikit-00-constitution
scripts
dashboard
iikit-01-specify
iikit-02-plan
iikit-03-checklist
scripts
bash
dashboard
iikit-04-testify
iikit-05-tasks
iikit-06-analyze
iikit-07-implement
iikit-08-taskstoissues
iikit-bugfix
scripts
dashboard
iikit-clarify
iikit-core
references
scripts
bash
dashboard
powershell
templates