Execute a strict Red-Green-Refactor TDD cycle — one requirement at a time — in any language or framework.
97
Quality
100%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
{
"context": "The agent is executing Phase 3 (REFACTOR) of a behavioral TDD cycle. A Shameless Green Python implementation uses hard-coded return values. The agent must refactor it to use the real formula while keeping the behavioral test green.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Formula replaces hard-coded values",
"description": "The Shameless Green hard-coded return values are replaced with the correct Celsius to Fahrenheit formula (C * 9/5 + 32)",
"max_score": 2
},
{
"name": "Code is more idiomatic",
"description": "The refactored code is more idiomatic (e.g. type hints, clear naming, single return path)",
"max_score": 2
},
{
"name": "Behavioral test still passes",
"description": "The agent explicitly confirms the behavioral test still passes after the refactor",
"max_score": 2
},
{
"name": "Public signature unchanged",
"description": "The public method signature (to_fahrenheit(celsius)) is unchanged",
"max_score": 2
},
{
"name": "Change summary provided",
"description": "The agent provides a summary of what was changed and why",
"max_score": 1
}
]
}