Generates **property-based tests** that use randomized input generation to validate invariants and contracts (rather than hand-picked examples). Triggers when the conversation involves: PBT frameworks (Hypothesis library for Python, fast-check for TypeScript, proptest for Rust, rapid for Go, RapidCheck for C++); concepts like invariants, contracts, round-trip symmetry, encode/decode, serialize/deserialize, generative testing, or shrinking; or requests to find edge cases that example-based tests miss — e.g., "find edge cases automatically", "test all possible inputs", "verify this property holds". Does NOT trigger for: writing regular example-based unit tests, debugging, CI/CD setup, UI/component testing, or integration/E2E testing. Identifies up to 7 property patterns (round-trip, idempotence, invariance, metamorphic, inverse, ordering, no-crash), designs input generators, writes property tests, and extracts regression tests from failures.
91
90%
Does it follow best practices?
Impact
94%
1.11xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent writes Go property tests using rapid, including custom generators, Draw calls, strong invariants, and failure handling.",
"type": "weighted_checklist",
"checklist": [
{
"name": "rapid dependency",
"description": "Suggests adding github.com/flyingmutant/rapid if needed.",
"max_score": 8
},
{
"name": "rapid import",
"description": "Imports github.com/flyingmutant/rapid in the test file.",
"max_score": 8
},
{
"name": "rapid Check",
"description": "Uses rapid.Check with a function taking *rapid.T.",
"max_score": 10
},
{
"name": "Draw calls",
"description": "Uses rapid generators with Draw(t, name) to obtain generated values.",
"max_score": 10
},
{
"name": "Custom generator",
"description": "Defines a custom or composed generator for Interval values and slices of intervals.",
"max_score": 10
},
{
"name": "Edge coverage",
"description": "Generated intervals include empty slices, duplicates, negative values, zero-length intervals, reversed endpoints, and large/small ints where practical.",
"max_score": 12
},
{
"name": "Multiple properties",
"description": "Implements at least three properties for the merge behavior.",
"max_score": 10
},
{
"name": "Semantic invariants",
"description": "Checks meaningful invariants such as output sortedness, non-overlap, endpoint normalization, idempotence, or coverage preservation.",
"max_score": 12
},
{
"name": "Avoid copy oracle",
"description": "Does not merely duplicate the merge algorithm as the only oracle; uses simpler checks like point coverage, idempotence, or structural invariants.",
"max_score": 8
},
{
"name": "Shrink awareness",
"description": "Notes rapid's shrinking/minimal counterexample output.",
"max_score": 6
},
{
"name": "Regression test",
"description": "Describes or includes a deterministic unit test template for any failing counterexample.",
"max_score": 6
}
]
}