CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/regression-scout

Use when the user wants regression hunting after a change. Identify nearby flows, shared code paths, error states, and configuration edges that may have broken even if the main fix works. Good triggers include "check for regressions", "what else might this have broken", and "test the surrounding area".

96

2.72x
Quality

94%

Does it follow best practices?

Impact

98%

2.72x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-2/

{
  "context": "The agent was asked to produce a regression scout report (report.md) for a payment service whose database connection pool size was increased from 5 to 20. The criteria evaluate whether the report identifies adjacent flows sharing the pool and investigates them for regressions.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Has Change Surface section",
      "description": "The report.md file contains a '### Change Surface' section heading",
      "max_score": 7
    },
    {
      "name": "Has Regression Checks section",
      "description": "The report.md file contains a '### Regression Checks' section heading",
      "max_score": 7
    },
    {
      "name": "Has Findings section",
      "description": "The report.md file contains a '### Findings' section heading",
      "max_score": 7
    },
    {
      "name": "Has Risk Left Open section",
      "description": "The report.md file contains a '### Risk Left Open' section heading",
      "max_score": 7
    },
    {
      "name": "Change Surface identifies pool.js or maxConnections change",
      "description": "The Change Surface section identifies pool.js, the maxConnections setting, or the connection pool size increase as the changed component",
      "max_score": 8
    },
    {
      "name": "Change Surface mentions at least 2 affected services",
      "description": "The Change Surface section mentions at least 2 of the services that share the pool: payment, refund, reporting, or audit",
      "max_score": 8
    },
    {
      "name": "Regression Checks covers reporting service batching logic",
      "description": "The Regression Checks section includes a check specifically on the reporting service or its query batching logic (which was written assuming pool size <= 5)",
      "max_score": 10
    },
    {
      "name": "Regression Checks covers at least one other service path",
      "description": "The Regression Checks section includes at least one check on any of: refund processing, audit log writes, or payment processing (beyond just the reporting service)",
      "max_score": 10
    },
    {
      "name": "Regression Checks lists at least 3 checks with results",
      "description": "The Regression Checks section lists at least 3 separate checks, each with an outcome or result stated",
      "max_score": 10
    },
    {
      "name": "Risk Left Open has concrete specific risk",
      "description": "The Risk Left Open section contains a concrete specific risk such as race conditions under high concurrency, connection starvation of a specific service, or an identified service with untested behavior at the new pool size",
      "max_score": 8
    },
    {
      "name": "Report does not primarily re-verify pool size increase",
      "description": "The report does NOT dedicate most of its content to confirming that the pool now accepts 20 connections — the primary focus is on impact to adjacent services and flows",
      "max_score": 9
    },
    {
      "name": "Findings includes explicit verdict",
      "description": "The Findings section includes an explicit verdict — either stating no regressions were found or naming specific regressions identified",
      "max_score": 9
    }
  ]
}

evals

tile.json