Name: haletothewood/behavioural-tdd
Rating: 0.976 (1 reviews)
Author: haletothewood

Blog Docs Log in Get started

haletothewood/behavioural-tdd

Execute a strict Red-Green-Refactor TDD cycle — one requirement at a time — in any language or framework.

1.11x

Quality

100%

Does it follow best practices?

Impact

94%

1.11x

Average score across 5 eval scenarios

{
  "context": "The agent is executing Phase 3 (REFACTOR) of a behavioral TDD cycle. A Shameless Green Python implementation uses hard-coded return values. The agent must refactor it to use the real formula while keeping the behavioral test green.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Formula replaces hard-coded values",
      "description": "The Shameless Green hard-coded return values are replaced with the correct Celsius to Fahrenheit formula (C * 9/5 + 32)",
      "max_score": 2
    },
    {
      "name": "Code is more idiomatic",
      "description": "The refactored code is more idiomatic (e.g. type hints, clear naming, single return path)",
      "max_score": 2
    },
    {
      "name": "Behavioral test still passes",
      "description": "The agent explicitly confirms the behavioral test still passes after the refactor",
      "max_score": 2
    },
    {
      "name": "Public signature unchanged",
      "description": "The public method signature (to_fahrenheit(celsius)) is unchanged",
      "max_score": 2
    },
    {
      "name": "Change summary provided",
      "description": "The agent provides a summary of what was changed and why",
      "max_score": 1
    }
  ]
}

Install with Tessl CLI

npx tessl i haletothewood/behavioural-tdd@1.8.0

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

references

haletothewood/behavioural-tdd

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

rubric.jsonevals/scenario-5/