Use when the user wants regression hunting after a change. Identify nearby flows, shared code paths, error states, and configuration edges that may have broken even if the main fix works. Good triggers include "check for regressions", "what else might this have broken", and "test the surrounding area".
96
94%
Does it follow best practices?
Impact
98%
2.72xAverage score across 8 eval scenarios
Passed
No known issues
Output format structure
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Regression Checks lists individual checks with results
0%
100%
Findings section gives explicit verdict
0%
100%
Risk Left Open has concrete specific risk
0%
100%
Change Surface identifies auth middleware change
0%
100%
Regression Checks covers auth/session adjacent paths
0%
100%
Report does not primarily re-verify JWT auth works
100%
100%
Adjacent flow identification
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
71%
100%
Has Risk Left Open section
0%
100%
Change Surface identifies pool.js or maxConnections change
0%
100%
Change Surface mentions at least 2 affected services
0%
100%
Regression Checks covers reporting service batching logic
0%
100%
Regression Checks covers at least one other service path
0%
100%
Regression Checks lists at least 3 checks with results
0%
100%
Risk Left Open has concrete specific risk
0%
100%
Report does not primarily re-verify pool size increase
100%
100%
Findings includes explicit verdict
77%
100%
Error and empty-state regression checks
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Change Surface identifies views.py or sort change
87%
100%
Regression Checks includes empty result set check
50%
100%
Regression Checks includes pagination interaction check
80%
100%
Regression Checks includes error or edge path check
60%
100%
Regression Checks lists at least 3 checks with results
75%
100%
Risk Left Open has concrete specific risk
50%
100%
Findings includes explicit verdict
50%
100%
Report does not primarily re-verify sorted results
100%
100%
Config and cache edge regression checks
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Change Surface identifies CACHE_TTL config change
0%
100%
Regression Checks includes stale price or inventory check
0%
100%
Regression Checks includes cache warmer or persistence check
0%
100%
Regression Checks includes auth-related path check
0%
100%
Regression Checks lists at least 3 checks with results
0%
100%
Risk Left Open has concrete specific risk
0%
100%
Findings includes explicit verdict
0%
100%
Report does not primarily re-verify CACHE_TTL was applied
100%
100%
Bias toward adjacent breakage
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Regression Checks covers at least one non-delete adjacent path
100%
100%
Regression Checks covers at least two different adjacent components
100%
100%
Regression Checks lists at least 3 checks with results
100%
100%
Report does not over-verify soft-delete feature
100%
100%
Risk Left Open has concrete specific risk
50%
100%
Findings includes explicit verdict
37%
100%
Change Surface identifies changed files or soft-delete addition
100%
100%
Risk documentation when no regressions found
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Findings explicitly states none found or equivalent
0%
0%
Risk Left Open lists at least one concrete specific risk
0%
100%
Risk Left Open does not just say none or no risk
0%
100%
Regression Checks lists at least 3 checks with results
0%
100%
Change Surface identifies parser change or parser.ts
0%
100%
Report covers at least one documented parser behavior difference
100%
100%
Report does not primarily re-verify refactor works
100%
100%
Full workflow across all zone categories
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
0%
100%
Has Risk Left Open section
0%
100%
Change Surface identifies at least 2 changed files or the chi migration
0%
100%
Regression Checks includes auth or permission path check
75%
100%
Regression Checks includes error or 4xx response path check
75%
100%
Regression Checks includes middleware behavior check
75%
100%
Regression Checks lists at least 4 checks with results
60%
100%
Risk Left Open has concrete specific risk
62%
100%
Findings includes explicit verdict
0%
100%
Report does not primarily re-verify routing works
100%
100%
Persistence and performance edge regression checks
Has Change Surface section
0%
100%
Has Regression Checks section
0%
100%
Has Findings section
57%
100%
Has Risk Left Open section
0%
100%
Change Surface identifies process_transactions.py or chunked reading
75%
100%
Regression Checks includes data correctness check
90%
100%
Regression Checks includes persistence path check
90%
100%
Regression Checks includes performance or timing edge check
100%
100%
Regression Checks lists at least 3 checks with results
87%
100%
Risk Left Open has concrete specific risk
75%
100%
Findings includes explicit verdict
87%
100%
Report does not primarily re-verify chunked reading works
100%
100%