CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/intent-integrity-kit

Closing the intent-to-code chasm - specification-driven development with BDD verification chain

86

1.82x
Quality

92%

Does it follow best practices?

Impact

86%

1.82x

Average score across 14 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent respects a constitutional offline-first constraint through spec and plan phases while producing properly structured artifacts. The PM explicitly pushes for specific network-dependent services (Google Vision, Firebase, SendGrid, exchange rate API), but the constitution mandates core operations work offline.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "No external API in core data path",
      "description": "plan.md does NOT place external API calls (Google Vision, currency APIs, Firebase, SendGrid) in the critical path of expense submission or approval. These may appear as async/optional enhancements only",
      "max_score": 12
    },
    {
      "name": "PM's specific service suggestions not adopted blindly",
      "description": "The PM explicitly suggested Google Vision API, Firebase Cloud Messaging, and SendGrid. plan.md does NOT adopt these as core dependencies — it either rejects them, defers them to online-only mode, or proposes offline alternatives",
      "max_score": 10
    },
    {
      "name": "Expense submission works offline in spec",
      "description": "spec.md explicitly states or implies that expense submission (photo, category, amount) works without network connectivity",
      "max_score": 8
    },
    {
      "name": "Currency conversion handled offline",
      "description": "plan.md addresses the currency requirement without requiring real-time API calls — e.g., cached exchange rates, manual entry, or periodic sync of rates when online",
      "max_score": 8
    },
    {
      "name": "Local storage as primary in plan",
      "description": "plan.md designs local/on-device storage as the primary data store (SQLite, Realm, or similar)",
      "max_score": 7
    },
    {
      "name": "Sync designed as optional enhancement",
      "description": "plan.md treats data synchronization as a background process that runs when connectivity is available — not as a prerequisite for any core operation",
      "max_score": 7
    },
    {
      "name": "Conflict resolution addressed",
      "description": "plan.md or data-model.md addresses what happens when offline edits conflict with server state during sync",
      "max_score": 6
    },
    {
      "name": "Spec uses numbered requirements",
      "description": "spec.md uses numbered functional requirements (FR-001, FR-002, etc.) and success criteria (SC-001, etc.) rather than unstructured prose",
      "max_score": 8
    },
    {
      "name": "Spec has acceptance scenarios",
      "description": "spec.md includes Given/When/Then acceptance scenarios for the core user stories",
      "max_score": 8
    },
    {
      "name": "Spec is technology-agnostic",
      "description": "spec.md describes WHAT the system does without prescribing specific technologies — no Google Vision, Firebase, SQLite, or other implementation choices in the specification",
      "max_score": 8
    },
    {
      "name": "Plan references spec requirements",
      "description": "plan.md or research.md references specific FR-XXX identifiers from the spec when justifying technical decisions",
      "max_score": 7
    },
    {
      "name": "No governance restated in plan",
      "description": "plan.md references the constitution but does NOT copy/paste the constitutional principles into the plan",
      "max_score": 5
    },
    {
      "name": "Spec does not promise network-dependent core features",
      "description": "spec.md does NOT describe receipt OCR, currency conversion, or push notifications as guaranteed core features. They may appear as optional enhancements",
      "max_score": 6
    }
  ]
}

evals

scenario-1

criteria.json

task.md

README.md

tile.json