Use when the user wants regression hunting after a change. Identify nearby flows, shared code paths, error states, and configuration edges that may have broken even if the main fix works. Good triggers include "check for regressions", "what else might this have broken", and "test the surrounding area".
96
94%
Does it follow best practices?
Impact
98%
2.72xAverage score across 8 eval scenarios
Passed
No known issues
{
"context": "The agent was asked to produce a regression scout report (report.md) for a Python Flask search endpoint that now sorts results by relevance score. The criteria evaluate whether the report specifically checks error handling, empty-state behavior, and other adjacent regression zones.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Has Change Surface section",
"description": "The report.md file contains a '### Change Surface' section heading",
"max_score": 7
},
{
"name": "Has Regression Checks section",
"description": "The report.md file contains a '### Regression Checks' section heading",
"max_score": 7
},
{
"name": "Has Findings section",
"description": "The report.md file contains a '### Findings' section heading",
"max_score": 7
},
{
"name": "Has Risk Left Open section",
"description": "The report.md file contains a '### Risk Left Open' section heading",
"max_score": 7
},
{
"name": "Change Surface identifies views.py or sort change",
"description": "The Change Surface section identifies views.py, the search endpoint handler, or the addition of the sort/ordering call as the changed component",
"max_score": 8
},
{
"name": "Regression Checks includes empty result set check",
"description": "The Regression Checks section includes a specific check on empty result set behavior (e.g. verifying that sorting an empty list does not raise an error or change the [] return value)",
"max_score": 10
},
{
"name": "Regression Checks includes pagination interaction check",
"description": "The Regression Checks section includes a check on the interaction between sorting and pagination — specifically whether sort is applied before or after the pagination slice, which could cause incorrect page contents",
"max_score": 10
},
{
"name": "Regression Checks includes error or edge path check",
"description": "The Regression Checks section includes a check on at least one error or edge path such as: results with null/missing relevance_score causing serializer errors, invalid query parameters, rate limiting behavior, or auth decorator behavior",
"max_score": 10
},
{
"name": "Regression Checks lists at least 3 checks with results",
"description": "The Regression Checks section lists at least 3 separate checks, each with an outcome or result stated",
"max_score": 8
},
{
"name": "Risk Left Open has concrete specific risk",
"description": "The Risk Left Open section contains a concrete specific risk relevant to the change (e.g. null relevance_score causing sort errors, pagination returning different items than before, filter interaction with sort order)",
"max_score": 8
},
{
"name": "Findings includes explicit verdict",
"description": "The Findings section includes an explicit verdict — either stating no regressions were found or naming specific regressions identified",
"max_score": 8
},
{
"name": "Report does not primarily re-verify sorted results",
"description": "The report does NOT dedicate most of its content to confirming that results are now returned sorted by relevance — the primary focus is on adjacent and edge-case breakage",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
regression-scout