Execute a strict Red-Green-Refactor TDD cycle — one requirement at a time — in any language or framework.
97
Quality
100%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
{
"context": "The agent is executing Phase 2 (GREEN) of a behavioral TDD cycle. A failing TypeScript/React test for a ContactForm already exists. The agent must write the minimum implementation to pass it and stop before refactoring.",
"type": "weighted_checklist",
"checklist": [
{
"name": "ContactForm component created",
"description": "A ContactForm React component is implemented with at least an email input field and a submit button",
"max_score": 2
},
{
"name": "Error message renders on empty submit",
"description": "The component renders the text 'Email is required' when the form is submitted with an empty email field",
"max_score": 2
},
{
"name": "Implementation is minimal",
"description": "The implementation is minimal — no additional form fields, validation rules, or features beyond what the test requires",
"max_score": 2
},
{
"name": "Test pass confirmed",
"description": "The agent confirms the provided test would pass with this implementation",
"max_score": 2
},
{
"name": "Phase gating respected",
"description": "The agent stops after Phase 2 and waits for user confirmation before proceeding to Phase 3",
"max_score": 2
}
]
}