CtrlK
BlogDocsLog inGet started
Tessl Logo

coding-agent-helpers/regression-scout

Use when the user wants regression hunting after a change. Identify nearby flows, shared code paths, error states, and configuration edges that may have broken even if the main fix works. Good triggers include "check for regressions", "what else might this have broken", and "test the surrounding area".

96

2.72x
Quality

94%

Does it follow best practices?

Impact

98%

2.72x

Average score across 8 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-4/

{
  "context": "The agent was asked to produce a regression scout report (report.md) for a Redis-backed product catalog service whose CACHE_TTL was increased from 300s to 3600s. The criteria evaluate whether the report checks auth edges, caching/persistence behavior, and performance-adjacent risks rather than just confirming the config was applied.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Has Change Surface section",
      "description": "The report.md file contains a '### Change Surface' section heading",
      "max_score": 7
    },
    {
      "name": "Has Regression Checks section",
      "description": "The report.md file contains a '### Regression Checks' section heading",
      "max_score": 7
    },
    {
      "name": "Has Findings section",
      "description": "The report.md file contains a '### Findings' section heading",
      "max_score": 7
    },
    {
      "name": "Has Risk Left Open section",
      "description": "The report.md file contains a '### Risk Left Open' section heading",
      "max_score": 7
    },
    {
      "name": "Change Surface identifies CACHE_TTL config change",
      "description": "The Change Surface section identifies the CACHE_TTL environment variable change, the TTL increase (300→3600), or the Redis cache configuration as the change surface",
      "max_score": 8
    },
    {
      "name": "Regression Checks includes stale price or inventory check",
      "description": "The Regression Checks section includes a check on whether stale cached prices (up to 1 hour old) could affect checkout flows, or whether inventory data read from the cache could be stale",
      "max_score": 10
    },
    {
      "name": "Regression Checks includes cache warmer or persistence check",
      "description": "The Regression Checks section includes a check on the cache warmer behavior (e.g. whether it correctly respects the new TTL and does not overwrite entries), or a check on another caching/persistence path",
      "max_score": 10
    },
    {
      "name": "Regression Checks includes auth-related path check",
      "description": "The Regression Checks section includes a check on the admin cache-invalidation endpoint, the ADMIN role JWT claim requirement, or another auth-adjacent path",
      "max_score": 10
    },
    {
      "name": "Regression Checks lists at least 3 checks with results",
      "description": "The Regression Checks section lists at least 3 separate checks, each with an outcome or result stated",
      "max_score": 8
    },
    {
      "name": "Risk Left Open has concrete specific risk",
      "description": "The Risk Left Open section contains a concrete specific risk such as: stale prices reaching checkout, price update lag exceeding SLA, cache warmer not refreshing stale data, or a specific scenario where the 1-hour TTL causes observable user-facing problems",
      "max_score": 8
    },
    {
      "name": "Findings includes explicit verdict",
      "description": "The Findings section includes an explicit verdict — either stating no regressions were found or naming specific regressions identified",
      "max_score": 8
    },
    {
      "name": "Report does not primarily re-verify CACHE_TTL was applied",
      "description": "The report does NOT dedicate most of its content to confirming that the CACHE_TTL change took effect — the primary focus is on downstream and adjacent effects of the longer TTL",
      "max_score": 10
    }
  ]
}

evals

tile.json