Generates **property-based tests** that use randomized input generation to validate invariants and contracts (rather than hand-picked examples). Triggers when the conversation involves: PBT frameworks (Hypothesis library for Python, fast-check for TypeScript, proptest for Rust, rapid for Go, RapidCheck for C++); concepts like invariants, contracts, round-trip symmetry, encode/decode, serialize/deserialize, generative testing, or shrinking; or requests to find edge cases that example-based tests miss — e.g., "find edge cases automatically", "test all possible inputs", "verify this property holds". Does NOT trigger for: writing regular example-based unit tests, debugging, CI/CD setup, UI/component testing, or integration/E2E testing. Identifies up to 7 property patterns (round-trip, idempotence, invariance, metamorphic, inverse, ordering, no-crash), designs input generators, writes property tests, and extracts regression tests from failures.
91
90%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent designs Python property-based tests around meaningful invariants, broad generators, Hypothesis APIs, and regression handling for failures.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Hypothesis dependency",
"description": "Mentions or encodes adding Hypothesis/pytest if the project lacks them.",
"max_score": 8
},
{
"name": "Correct imports",
"description": "Imports Hypothesis given and strategies as st in the test code.",
"max_score": 8
},
{
"name": "Given decorators",
"description": "Uses @given decorators on property tests rather than only example tests.",
"max_score": 8
},
{
"name": "Multiple properties",
"description": "Identifies at least three distinct properties or invariants in notes or test names/assertions.",
"max_score": 10
},
{
"name": "Full input space",
"description": "Defines generated cart items including edge values such as empty collections, zero or negative quantities, duplicate IDs, and unusual strings or missing optional fields.",
"max_score": 12
},
{
"name": "Composed strategies",
"description": "Composes primitive strategies into dictionaries/lists or structured cart item strategies.",
"max_score": 8
},
{
"name": "Generate act assert",
"description": "Each property test has a clear generated input, calls normalize_cart, and asserts an invariant.",
"max_score": 10
},
{
"name": "Meaningful invariants",
"description": "Checks semantic invariants such as positive output quantities, aggregation conservation, unique product IDs, or total equals rounded quantity times price.",
"max_score": 12
},
{
"name": "Avoids weak only",
"description": "Does not rely solely on no-crash assertions when stronger invariants are available.",
"max_score": 8
},
{
"name": "Shrinking noted",
"description": "Explains that failing property tests provide a minimal/shrunk counterexample to inspect.",
"max_score": 8
},
{
"name": "Regression extraction",
"description": "Includes or describes adding a deterministic regression unit test from any failing shrunk case.",
"max_score": 8
}
]
}