CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/regression-scout

Use when the user wants regression hunting after a change. Identify nearby flows, shared code paths, error states, and configuration edges that may have broken even if the main fix works. Good triggers include "check for regressions", "what else might this have broken", and "test the surrounding area".

96

2.72x
Quality

94%

Does it follow best practices?

Impact

98%

2.72x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "The agent was asked to produce a regression scout report (report.md) for a Node.js Express REST API that migrated authentication middleware from sessions to JWT tokens. The criteria evaluate whether the report follows the required output format and adequately covers regression risk areas.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Has Change Surface section",
      "description": "The report.md file contains a '### Change Surface' section heading",
      "max_score": 8
    },
    {
      "name": "Has Regression Checks section",
      "description": "The report.md file contains a '### Regression Checks' section heading",
      "max_score": 8
    },
    {
      "name": "Has Findings section",
      "description": "The report.md file contains a '### Findings' section heading",
      "max_score": 8
    },
    {
      "name": "Has Risk Left Open section",
      "description": "The report.md file contains a '### Risk Left Open' section heading",
      "max_score": 8
    },
    {
      "name": "Regression Checks lists individual checks with results",
      "description": "The Regression Checks section lists at least 3 individual checks, each accompanied by a result (pass, fail, concern, N/A, or equivalent outcome statement)",
      "max_score": 10
    },
    {
      "name": "Findings section gives explicit verdict",
      "description": "The Findings section explicitly states either that no regressions were found (e.g. 'none found') OR names at least one specific regression — it does not omit the section or leave it without a verdict",
      "max_score": 10
    },
    {
      "name": "Risk Left Open has concrete specific risk",
      "description": "The Risk Left Open section contains at least one concrete, plausible, specific risk (not just a vague statement like 'further testing recommended')",
      "max_score": 10
    },
    {
      "name": "Change Surface identifies auth middleware change",
      "description": "The Change Surface section identifies auth.js, the authentication middleware, or the JWT/session migration as part of the change surface",
      "max_score": 10
    },
    {
      "name": "Regression Checks covers auth/session adjacent paths",
      "description": "The Regression Checks section includes at least one check specifically targeting an auth or session-adjacent path such as logout behavior, token revocation via tokenBlacklist.js, session expiry vs JWT expiry, or token blacklist lookup",
      "max_score": 14
    },
    {
      "name": "Report does not primarily re-verify JWT auth works",
      "description": "The report does NOT dedicate most of its content to confirming that JWT authentication itself works correctly — the primary focus is on adjacent and downstream breakage rather than re-proving the migration succeeded",
      "max_score": 14
    }
  ]
}

evals

scenario-1

criteria.json

task.md

tile.json