CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/test-driven-development

Master Test-Driven Development with deterministic red-green-refactor workflows, test-first feature delivery, bug reproduction through failing tests, behavior-focused assertions, and refactoring safety; use when implementing new functions, changing APIs, fixing regressions, or restructuring code under test.

94

1.04x
Quality

100%

Does it follow best practices?

Impact

86%

1.04x

Average score across 5 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

48%

15%

User Authentication Service

Red-Green-Refactor Cycle

Criteria
Without context
With context

Test-first implementation

0%

0%

Verified test failures

0%

0%

Minimal implementation

30%

80%

Refactoring evidence

0%

20%

Red-green cycle discipline

0%

10%

Test structure follows AAA

70%

100%

One Act per test

70%

80%

Single behavior per test

70%

90%

Descriptive test names

90%

100%

93%

7%

Shopping Cart Bug Fix

Behavioral Testing and Assertions

Criteria
Without context
With context

Bug reproduction test

80%

86%

Behavior-focused tests

86%

100%

Descriptive test naming

80%

80%

One assertion per test

90%

90%

Specific assertions used

90%

100%

Edge case coverage

80%

100%

Clear test structure

90%

90%

Isolated test cases

100%

100%

Bug fix verification

80%

90%

100%

Multi-Channel Notification Service

Dependency Injection and Mocking

Criteria
Without context
With context

Constructor dependency injection

100%

100%

External dependencies mocked

100%

100%

Interface-based design

100%

100%

Test data visibility

100%

100%

Independent test execution

100%

100%

Mock verification

100%

100%

Error scenario testing

100%

100%

Clear test boundaries

100%

100%

Deterministic test data

100%

100%

96%

-1%

File Processing Queue System

Asynchronous Code Testing

Criteria
Without context
With context

No arbitrary sleeps

100%

100%

Proper async assertions

86%

86%

Deterministic timing

100%

100%

AAA structure in async

90%

100%

Isolated async tests

100%

90%

Error case handling

100%

90%

Stable test execution

100%

100%

Mock async dependencies

100%

100%

Clear async test names

100%

100%

95%

2%

Multi-Step Order Processing System

Complex Workflow Testing

Criteria
Without context
With context

Single behavior per test

93%

93%

No behavior combination

60%

86%

Visible test data

100%

100%

Independent test cases

100%

100%

Logical test organization

100%

100%

Clear workflow step tests

100%

80%

Error scenario isolation

100%

100%

Descriptive test structure

100%

100%

Clean test setup

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents