Spec-driven workflow covering requirement gathering, spec authoring, implementation review, and verification — with skills, rules, and evaluation scenarios.
96
90%
Does it follow best practices?
Impact
98%
1.19xAverage score across 9 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent performs a thorough work review following the work-review skill: checking each requirement, running linked tests, capturing discovered requirements, updating the spec, and producing a structured review summary.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Review summary file created",
"description": "A file named review.md is produced containing the review output",
"max_score": 5
},
{
"name": "Pass/fail per requirement",
"description": "review.md contains a requirements section with individual pass/fail status for each spec requirement (using checkboxes or equivalent)",
"max_score": 12
},
{
"name": "File references on passing items",
"description": "Passing requirements in review.md include a reference to the implementing file and approximate location (e.g. src/notifications/sender.py)",
"max_score": 8
},
{
"name": "Missing requirement identified",
"description": "review.md identifies that send_sms silently skips when no phone number is on file (no MissingContactError raised), which is not documented in the spec",
"max_score": 12
},
{
"name": "Discovered requirement documented",
"description": "review.md includes a 'Discovered requirements' section noting the account.reactivated event support added in the implementation (not in the original spec)",
"max_score": 12
},
{
"name": "Spec updated with discovered requirement",
"description": "specs/notifications.spec.md is updated to include the account.reactivated event (or a note about the newly discovered behavior)",
"max_score": 10
},
{
"name": "Test results section",
"description": "review.md contains a test results section listing the linked test files and their outcomes",
"max_score": 8
},
{
"name": "Linked tests referenced",
"description": "review.md explicitly references at least three of the [@test]-linked files from the spec (test_delivery.py, test_events.py, test_retry.py, test_templates.py)",
"max_score": 8
},
{
"name": "Spec updates section",
"description": "review.md contains a spec updates section describing what was changed in the spec",
"max_score": 8
},
{
"name": "Targets still accurate",
"description": "After review, specs/notifications.spec.md targets still include both src/notifications/sender.py and src/notifications/templates.py",
"max_score": 7
},
{
"name": "Structured review format",
"description": "review.md is organized into clear sections covering: requirement status, discovered differences, test outcomes, and spec changes — exact heading names do not matter as long as these four topics are covered in distinct sections",
"max_score": 10
}
]
}