Execute a strict Red-Green-Refactor TDD cycle — one requirement at a time — in any language or framework.
97
Quality
100%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
Execute one complete TDD cycle per invocation, in any language or framework. Never skip a phase. Never test private internals — only observable behavior through the public interface.
Input: The requirement or business rule provided by the user.
After presenting the test above, do not write any implementation code. Close your response and wait for the user to confirm before moving to Phase 2.
Input: The failing test from Phase 1 (confirmed by the user).
After presenting the implementation above, close your response and wait for the user to confirm before moving to Phase 3.
Input: The passing code and behavioral test from Phases 1–2.
Mock at system boundaries only — external APIs, databases, time, file system. Never mock your own classes, internal collaborators, or anything you control. Use dependency injection to make boundaries explicit and mockable.
| Allowed | Not Allowed |
|---|---|
Mock an IEmailService injected via constructor | Mock a private _sendSmtp() method |
| Stub an HTTP client passed as a parameter | Assert on internal variable values |
| Use a fake repository implementing a public interface | Spy on private class internals |
| Intercept a React prop callback | Assert on React component internal state |
Design interfaces that return results rather than produce side effects — functions that return values are easier to assert on than functions that mutate state. See mocking.md for patterns and examples.
| Rule | Rationale |
|---|---|
| Public interface only | Tests survive internal rewrites |
| One requirement per cycle | Fast feedback, clear failure signal |
| No mocking private methods | Avoids coupling tests to implementation |
| No assertions on internal state | Preserves behavioral integrity |
| Shameless Green in Phase 2 | Establishes feedback loop before optimizing |
When writing tests for UI components, query by the highest-level user-facing role available. Prefer what a user or assistive technology sees over what the DOM implements.
| Prefer | Over | Reason |
|---|---|---|
getByRole('button', { name: /submit/i }) | getByTestId('submit-btn') | Role + accessible name is user-facing |
getByRole('textbox', { name: /email/i }) | getByPlaceholderText('email') | Accessible name survives placeholder changes |
getByRole('heading', { name: /confirm/i }) | querySelector('h2') | Semantic role, not element type |
getByRole('checkbox', { name: /agree/i }) | getByAttribute('type', 'checkbox') | Role is stable across implementations |
The name option on getByRole matches the accessible name (label text,
aria-label, or aria-labelledby). Using it requires the component to expose
a proper accessible name — which is itself a behavioral requirement worth
enforcing in the test.
See tests.md for good vs bad test examples and red flags.
Requirement: A ShoppingCart returns a total with 10% discount when 3+ items are added.
RED
def test_discount_applied_for_three_or_more_items():
cart = ShoppingCart()
cart.add_item(price=10.00)
cart.add_item(price=10.00)
cart.add_item(price=10.00)
assert cart.total() == 27.00 # 10% off 30.00
# Fails: ShoppingCart is not definedGREEN
class ShoppingCart:
def __init__(self):
self._items = []
def add_item(self, price):
self._items.append(price)
def total(self):
subtotal = sum(self._items)
if len(self._items) >= 3:
return subtotal * 0.90
return subtotalREFACTOR
The behavioral test test_discount_applied_for_three_or_more_items still passes.
BULK_DISCOUNT_RATE = 0.10
BULK_DISCOUNT_THRESHOLD = 3
class ShoppingCart:
def __init__(self):
self._items: list[float] = []
def add_item(self, price: float) -> None:
self._items.append(price)
def total(self) -> float:
subtotal = sum(self._items)
if len(self._items) >= BULK_DISCOUNT_THRESHOLD:
return subtotal * (1 - BULK_DISCOUNT_RATE)
return subtotalRequirement: OrderService.PlaceOrder() returns OrderResult.Success when stock is available.
RED
[Fact]
public void PlaceOrder_ReturnsSuccess_WhenStockIsAvailable()
{
var stockChecker = Substitute.For<IStockChecker>(); // injected dependency
stockChecker.IsAvailable("SKU-001", 1).Returns(true);
var service = new OrderService(stockChecker);
var result = service.PlaceOrder("SKU-001", quantity: 1);
Assert.Equal(OrderResult.Success, result);
}
// Fails: OrderService does not existGREEN
public class OrderService
{
private readonly IStockChecker _stockChecker;
public OrderService(IStockChecker stockChecker) => _stockChecker = stockChecker;
public OrderResult PlaceOrder(string sku, int quantity)
{
if (_stockChecker.IsAvailable(sku, quantity))
return OrderResult.Success;
return OrderResult.OutOfStock;
}
}REFACTOR — no structural changes needed; code is already clear. Confirm: behavioral test still passes.
Requirement: A SubmitButton renders as disabled when a loading prop is true.
RED
it('is disabled when loading', () => {
render(<SubmitButton loading={true}>Save</SubmitButton>);
expect(screen.getByRole('button', { name: /save/i })).toBeDisabled();
});
// Fails: SubmitButton is not definedGREEN
export function SubmitButton({
loading,
children,
}: {
loading: boolean;
children: React.ReactNode;
}) {
return <button disabled={loading}>{children}</button>;
}REFACTOR
interface SubmitButtonProps {
loading: boolean;
children: React.ReactNode;
}
export function SubmitButton({ loading, children }: SubmitButtonProps) {
return (
<button disabled={loading} type="submit">
{children}
</button>
);
}
// Behavioral test still passes.Install with Tessl CLI
npx tessl i haletothewood/behavioural-tdd@1.8.0