CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/intent-integrity-kit

Closing the intent-to-code chasm - specification-driven development with BDD verification chain

86

1.82x
Quality

92%

Does it follow best practices?

Impact

86%

1.82x

Average score across 14 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

91%

15%

Constraint Survival: Offline-First Expense Tracker

Criteria
Without context
With context

No external API in core data path

100%

100%

PM's specific service suggestions not adopted blindly

90%

100%

Expense submission works offline in spec

100%

100%

Currency conversion handled offline

100%

100%

Local storage as primary in plan

100%

100%

Sync designed as optional enhancement

100%

100%

Conflict resolution addressed

100%

100%

Spec uses numbered requirements

50%

100%

Spec has acceptance scenarios

0%

100%

Spec is technology-agnostic

50%

50%

Plan references spec requirements

0%

28%

No governance restated in plan

100%

100%

Spec does not promise network-dependent core features

100%

100%

91%

91%

Technical Design for Notification Service Feature

Criteria
Without context
With context

plan.md exists

0%

100%

Technical Context fields

0%

100%

No bare Option labels

0%

100%

research.md with rationale

0%

100%

data-model.md entities

0%

100%

State transitions in data-model

0%

100%

contracts/ directory

0%

100%

Contracts reference spec requirements

0%

100%

quickstart.md exists with scenarios

0%

100%

No governance content in plan

0%

100%

Spec quality assessment performed

0%

25%

Plan decisions trace to spec FRs

0%

100%

context.json updated

0%

100%

88%

15%

Scope Creep Detection: Simple Bookmark Manager

Criteria
Without context
With context

Exactly 3 user stories in spec

100%

100%

No mentioned-but-deferred features in spec

100%

100%

No excluded features in plan

100%

100%

No excluded features in tasks

100%

100%

Spec uses numbered requirements

0%

100%

Spec has acceptance scenarios

37%

100%

FR count proportional to scope

0%

50%

Task count proportional to scope

75%

0%

Data model matches scope

100%

100%

Plan tech stack appropriate for scope

100%

100%

Tasks use structured format

50%

100%

Accessibility addressed

100%

100%

87%

28%

Bug Report: Payment Processing Failure on Retry

Criteria
Without context
With context

T-BNNN task ID format

0%

100%

At least 1 fix task with TS ref

100%

100%

TDD task references test spec

0%

100%

bugs.md BUG-NNN entry

100%

100%

bugs.md required fields

70%

100%

bugs.md date format

100%

100%

Existing tasks unmodified

100%

100%

Bug ID in task descriptions

100%

100%

New .feature file created

0%

0%

92%

82%

Acceptance Test Suite for a User Notifications Feature

Criteria
Without context
With context

TS-XXX tags present

0%

100%

FR-XXX tags present

0%

100%

SC-XXX tags present

0%

100%

SC-XXX majority coverage

0%

100%

US-XXX tags present

0%

0%

Priority tags present

0%

100%

Test type tags present

0%

100%

DO NOT MODIFY header

0%

100%

Feature-level US tag

0%

100%

All acceptance scenarios covered

100%

100%

TS-XXX uniqueness

0%

100%

90%

49%

Greenfield Full Pipeline: Team Standup Bot

Criteria
Without context
With context

Spec is technology-agnostic

100%

87%

Plan has no governance content

100%

33%

FR-XXX to TS-XXX coverage

0%

100%

TS-XXX to task coverage

0%

75%

.feature files have DO NOT MODIFY headers

0%

100%

.feature files have required tags

0%

75%

Tasks ordered: Setup → Foundational → Stories → Polish

83%

100%

TDD task ordering within story phases

40%

100%

No phantom requirements

100%

100%

Privacy constraint in .feature files

100%

100%

No FR orphans in either direction

37%

100%

TS-XXX IDs are unique across files

0%

100%

95%

28%

Plan-to-Tasks Traceability: Event Ticketing Platform

Criteria
Without context
With context

File paths match plan structure

86%

80%

Every user story has tagged tasks

20%

100%

Setup/Foundational tasks have no story tags

100%

100%

TS references are comma-separated

75%

100%

TS references match provided .feature files

100%

100%

Priority ordering respected

50%

100%

Phase structure complete

62%

100%

[P] markers only on parallelizable tasks

37%

75%

No technologies beyond the plan

100%

100%

Checkbox format used

0%

100%

Sequential T-prefixed IDs

50%

100%

100%

49%

Update Technical Design: File Upload Feature

Criteria
Without context
With context

NEEDS CLARIFICATION flagged

100%

100%

FR count assessed

0%

100%

Measurable criteria warning

0%

100%

Quality score reported

0%

100%

Semantic diff present

80%

100%

Semantic diff format

0%

100%

Downstream impact flagged

12%

100%

Updated plan has new dependencies

100%

100%

Updated architecture diagram

100%

100%

No governance in plan

100%

100%

Clarification assumptions documented

60%

100%

83%

35%

Spec-to-Plan Phase Separation: IoT Fleet Management

Criteria
Without context
With context

No technology in spec.md

100%

100%

No governance in plan.md

16%

41%

FR-XXX requirements in spec

0%

100%

SC-XXX success criteria in spec

0%

100%

User stories in spec

0%

100%

Given/When/Then scenarios in spec

0%

100%

Plan references spec FRs

0%

0%

Every spec FR traceable to plan

50%

100%

No phantom requirements in plan

100%

100%

data-model.md traces to spec entities

100%

100%

Connection-lost requirement survives to plan

100%

100%

100%

63%

Feature Specification: Team Document Collaboration

Criteria
Without context
With context

No technology stack in spec

0%

100%

FR-XXX numbered requirements

0%

100%

SC-XXX success criteria

0%

100%

Given/When/Then scenarios

0%

100%

User stories present

100%

100%

Measurable success criteria

0%

100%

Max 3 NEEDS CLARIFICATION

100%

100%

No implementation details

0%

100%

2-4 word branch name

100%

100%

Requirements.md checklist created

100%

100%

98%

43%

Task Breakdown for an Inventory Management API

Criteria
Without context
With context

Sequential T-prefixed IDs

50%

100%

[P] marker usage

20%

80%

[USn] label on story tasks

0%

100%

Comma-separated TS references

53%

100%

Phase 1 Setup section

100%

100%

Phase 2 Foundational section

87%

100%

User Story phases ordered by priority

75%

100%

File paths in descriptions

100%

100%

Checkbox format

0%

100%

Polish/Final phase

80%

100%

95%

20%

TDD Pipeline with Constitution Enforcement: Appointment Scheduling API

Criteria
Without context
With context

.feature files cover all FRs

50%

100%

DO NOT MODIFY headers present

0%

100%

All required tags on every Scenario

0%

75%

TS-XXX IDs unique across all files

100%

100%

Scenarios match spec acceptance criteria

90%

100%

Privacy/auth scenario present

100%

100%

Every story task references TS-XXX

80%

80%

TS references are comma-separated

100%

100%

Test tasks before production tasks per story

100%

100%

Task file paths match plan structure

100%

100%

Phase structure and sequential IDs

80%

100%

Concurrency test scenario present

100%

100%

22%

Technical Design for Marketplace Search Feature

Criteria
Without context
With context

ASCII architecture diagram

100%

100%

Named components in diagram

100%

100%

context.json exists

0%

0%

planview.nodeClassifications key

0%

0%

Existing data preserved

0%

0%

Client node classified

0%

0%

Server node classified

0%

0%

Storage node classified

0%

0%

External node classified

0%

0%

80%

24%

Project Governance Document for a Healthcare Data Platform

Criteria
Without context
With context

No technology stack in constitution

25%

100%

No implementation details

62%

100%

At least 3 principles

100%

100%

Principles are declarative

100%

100%

Semver version present

42%

100%

Amendment procedure present

100%

100%

.specify/context.json created

41%

0%

TDD determination value valid

0%

50%

Principles are domain-agnostic

40%

70%

Testing philosophy explicitly stated

50%

100%

Dates in ISO format

100%

100%

No feature-specific content

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6