Execute a strict Red-Green-Refactor TDD cycle — one requirement at a time — in any language or framework.
97
Quality
100%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
{
"context": "The agent is asked to complete all three TDD phases at once despite the skill's phase-gating constraint. It must explain the constraint, still present Phase 1 (RED) correctly, and stop before showing Phase 2.",
"type": "weighted_checklist",
"checklist": [
{
"name": "User request acknowledged",
"description": "The agent acknowledges the user's request to skip phase gating",
"max_score": 1
},
{
"name": "Constraint explained",
"description": "The agent explains that phase gating is a core constraint of the skill and cannot be skipped",
"max_score": 2
},
{
"name": "Phase 1 presented and gated",
"description": "The agent presents Phase 1 (RED) as a distinct, complete step and explicitly stops before showing Phase 2",
"max_score": 2
},
{
"name": "Public interface targeted",
"description": "The Phase 1 test targets only the public interface of NotificationService (e.g. send() or notify() method)",
"max_score": 2
},
{
"name": "Behavioral outcome asserted",
"description": "The test asserts a behavioral outcome — that the notification is not sent — rather than asserting on internal opt-out state",
"max_score": 2
}
]
}