CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/intent-integrity-kit

Closing the intent-to-code chasm - specification-driven development with BDD verification chain

93

1.84x
Quality

93%

Does it follow best practices?

Impact

94%

1.84x

Average score across 14 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

100%

85%

Technical Design for Marketplace Search Feature

Criteria
Without context
With context

ASCII architecture diagram

41%

100%

Named components in diagram

100%

100%

context.json exists

0%

100%

planview.nodeClassifications key

0%

100%

Existing data preserved

0%

100%

Client node classified

0%

100%

Server node classified

0%

100%

Storage node classified

0%

100%

External node classified

0%

100%

100%

56%

Bug Report: Payment Processing Failure on Retry

Criteria
Without context
With context

T-BNNN task ID format

0%

100%

At least 1 fix task with TS ref

0%

100%

TDD task references test spec

0%

100%

bugs.md BUG-NNN entry

100%

100%

bugs.md required fields

70%

100%

bugs.md date format

100%

100%

Existing tasks unmodified

100%

100%

Bug ID in task descriptions

100%

100%

New .feature file created

0%

100%

96%

17%

Constraint Survival: Offline-First Expense Tracker

Criteria
Without context
With context

No external API in core data path

100%

100%

PM's specific service suggestions not adopted blindly

100%

90%

Expense submission works offline in spec

100%

100%

Currency conversion handled offline

100%

100%

Local storage as primary in plan

100%

100%

Sync designed as optional enhancement

100%

100%

Conflict resolution addressed

100%

100%

Spec uses numbered requirements

37%

100%

Spec has acceptance scenarios

37%

100%

Spec is technology-agnostic

50%

100%

Plan references spec requirements

0%

57%

No governance restated in plan

100%

100%

Spec does not promise network-dependent core features

100%

100%

93%

83%

Acceptance Test Suite for a User Notifications Feature

Criteria
Without context
With context

TS-XXX tags present

0%

100%

FR-XXX tags present

0%

100%

SC-XXX tags present

0%

100%

SC-XXX majority coverage

0%

100%

US-XXX tags present

0%

62%

Priority tags present

0%

100%

Test type tags present

0%

100%

DO NOT MODIFY header

0%

83%

Feature-level US tag

0%

75%

All acceptance scenarios covered

100%

100%

TS-XXX uniqueness

0%

100%

96%

36%

Project Governance Document for a Healthcare Data Platform

Criteria
Without context
With context

No technology stack in constitution

66%

100%

No implementation details

87%

75%

At least 3 principles

100%

100%

Principles are declarative

100%

100%

Semver version present

42%

100%

Amendment procedure present

100%

100%

.specify/context.json created

0%

100%

TDD determination value valid

0%

100%

Principles are domain-agnostic

40%

80%

Testing philosophy explicitly stated

62%

100%

Dates in ISO format

100%

100%

No feature-specific content

100%

100%

95%

50%

Greenfield Full Pipeline: Team Standup Bot

Criteria
Without context
With context

Spec is technology-agnostic

87%

100%

Plan has no governance content

100%

66%

FR-XXX to TS-XXX coverage

0%

86%

TS-XXX to task coverage

0%

100%

.feature files have DO NOT MODIFY headers

0%

100%

.feature files have required tags

0%

100%

Tasks ordered: Setup → Foundational → Stories → Polish

83%

100%

TDD task ordering within story phases

70%

100%

No phantom requirements

100%

100%

Privacy constraint in .feature files

100%

100%

No FR orphans in either direction

62%

87%

TS-XXX IDs are unique across files

0%

100%

89%

87%

Technical Design for Notification Service Feature

Criteria
Without context
With context

plan.md exists

0%

100%

Technical Context fields

0%

100%

No bare Option labels

0%

100%

research.md with rationale

0%

100%

data-model.md entities

0%

100%

State transitions in data-model

0%

100%

contracts/ directory

0%

100%

Contracts reference spec requirements

0%

100%

quickstart.md exists with scenarios

0%

100%

No governance content in plan

0%

87%

Spec quality assessment performed

0%

16%

Plan decisions trace to spec FRs

25%

100%

context.json updated

0%

100%

95%

21%

Plan-to-Tasks Traceability: Event Ticketing Platform

Criteria
Without context
With context

File paths match plan structure

100%

93%

Every user story has tagged tasks

50%

100%

Setup/Foundational tasks have no story tags

100%

100%

TS references are comma-separated

50%

100%

TS references match provided .feature files

100%

100%

Priority ordering respected

87%

100%

Phase structure complete

62%

100%

[P] markers only on parallelizable tasks

37%

75%

No technologies beyond the plan

100%

80%

Checkbox format used

0%

100%

Sequential T-prefixed IDs

83%

100%

86%

3%

Scope Creep Detection: Simple Bookmark Manager

Criteria
Without context
With context

Exactly 3 user stories in spec

100%

100%

No mentioned-but-deferred features in spec

100%

100%

No excluded features in plan

100%

100%

No excluded features in tasks

100%

100%

Spec uses numbered requirements

37%

100%

Spec has acceptance scenarios

62%

25%

FR count proportional to scope

50%

100%

Task count proportional to scope

100%

0%

Data model matches scope

100%

100%

Plan tech stack appropriate for scope

100%

100%

Tasks use structured format

37%

100%

Accessibility addressed

100%

100%

97%

41%

Update Technical Design: File Upload Feature

Criteria
Without context
With context

NEEDS CLARIFICATION flagged

100%

100%

FR count assessed

0%

100%

Measurable criteria warning

0%

100%

Quality score reported

0%

100%

Semantic diff present

80%

100%

Semantic diff format

0%

100%

Downstream impact flagged

25%

100%

Updated plan has new dependencies

100%

100%

Updated architecture diagram

100%

100%

No governance in plan

100%

62%

Clarification assumptions documented

100%

100%

92%

14%

Spec-to-Plan Phase Separation: IoT Fleet Management

Criteria
Without context
With context

No technology in spec.md

100%

100%

No governance in plan.md

83%

41%

FR-XXX requirements in spec

100%

100%

SC-XXX success criteria in spec

33%

100%

User stories in spec

0%

100%

Given/When/Then scenarios in spec

50%

100%

Plan references spec FRs

30%

100%

Every spec FR traceable to plan

100%

100%

No phantom requirements in plan

100%

91%

data-model.md traces to spec entities

100%

100%

Connection-lost requirement survives to plan

100%

100%

96%

50%

Feature Specification: Team Document Collaboration

Criteria
Without context
With context

No technology stack in spec

13%

100%

FR-XXX numbered requirements

100%

100%

SC-XXX success criteria

0%

100%

Given/When/Then scenarios

0%

100%

User stories present

100%

75%

Measurable success criteria

0%

100%

Max 3 NEEDS CLARIFICATION

100%

100%

No implementation details

8%

100%

2-4 word branch name

80%

80%

Requirements.md checklist created

77%

100%

96%

40%

Task Breakdown for an Inventory Management API

Criteria
Without context
With context

Sequential T-prefixed IDs

58%

100%

[P] marker usage

20%

80%

[USn] label on story tasks

10%

100%

Comma-separated TS references

66%

86%

Phase 1 Setup section

100%

100%

Phase 2 Foundational section

75%

100%

User Story phases ordered by priority

50%

100%

File paths in descriptions

100%

100%

Checkbox format

0%

100%

Polish/Final phase

90%

100%

96%

20%

TDD Pipeline with Constitution Enforcement: Appointment Scheduling API

Criteria
Without context
With context

.feature files cover all FRs

100%

100%

DO NOT MODIFY headers present

0%

100%

All required tags on every Scenario

25%

100%

TS-XXX IDs unique across all files

100%

100%

Scenarios match spec acceptance criteria

90%

100%

Privacy/auth scenario present

100%

100%

Every story task references TS-XXX

86%

80%

TS references are comma-separated

100%

100%

Test tasks before production tasks per story

20%

100%

Task file paths match plan structure

100%

87%

Phase structure and sequential IDs

80%

100%

Concurrency test scenario present

100%

100%

Evaluated
Agent
Claude
Model
Claude Sonnet 4.6