CtrlK
BlogDocsLog inGet started
Tessl Logo

uinaf/review-gang

Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.

92

1.22x
Quality

97%

Does it follow best practices?

Impact

81%

1.22x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-4/

{
  "context": "Tests whether the agent spawns the mandatory default reviewer gang for a refactor that touches types, error handling, dead code, and auth, adds relevant conditional personas, avoids unnecessary extra personas, loads the repo's AGENTS.md, and produces a well-structured output with verdict and evidence without a noisy footer.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Default gang spawned",
      "description": "The report mentions spawning all three default reviewer personas: general, tests, and silent-failures",
      "max_score": 8
    },
    {
      "name": "Types persona included",
      "description": "The report mentions using the types reviewer persona (or equivalent type-safety lens) — appropriate given the new AuthToken/runtime schema introduction",
      "max_score": 10
    },
    {
      "name": "Cleanup persona included",
      "description": "The report mentions using the cleanup reviewer persona (or equivalent dead code lens) — appropriate given middleware.old.ts is a dead file",
      "max_score": 8
    },
    {
      "name": "Comments persona omitted",
      "description": "The report does NOT mention using a comments reviewer persona — no substantial comment changes justify it",
      "max_score": 6
    },
    {
      "name": "Repo guidance loaded",
      "description": "The report references or incorporates at least one rule from AGENTS.md (e.g. Valibot preference, logging prohibition, 401 JSON shape, integration test requirement)",
      "max_score": 10
    },
    {
      "name": "Personas listed in output",
      "description": "The review report shows spawned personas in compact metadata or evidence, but does not force persona details into the verdict footer",
      "max_score": 6
    },
    {
      "name": "Verdict present",
      "description": "The review report contains exactly one of: 'ship it', 'needs review', or 'blocked' as a verdict",
      "max_score": 8
    },
    {
      "name": "Scope stated",
      "description": "The report names the reviewed scope only when needed to disambiguate the review; it does not add a noisy scope line after clear findings",
      "max_score": 6
    },
    {
      "name": "Findings ordered by severity",
      "description": "When multiple findings are present, they are listed with severity labels (e.g. high/medium/low) or are otherwise ranked by risk level",
      "max_score": 8
    },
    {
      "name": "Evidence for findings",
      "description": "At least one finding includes a specific file reference (with or without line number) rather than a vague description",
      "max_score": 10
    },
    {
      "name": "Silent failures finding",
      "description": "The report identifies the silent error swallowing in middleware.old.ts (catch blocks that call next() without error) as a concern — OR notes this file is dead/unused",
      "max_score": 8
    },
    {
      "name": "Unverified areas acknowledged",
      "description": "The report includes a section or statement about unverified areas, readiness gaps, or surfaces that could not be confirmed",
      "max_score": 6
    },
    {
      "name": "Recommended follow-up",
      "description": "The report includes a recommended follow-up action such as implementation, runtime verification, readiness setup, documentation cleanup, or none",
      "max_score": 6
    },
    {
      "name": "Compact verdict block",
      "description": "The report keeps verdict metadata compact: no more than 4 labeled lines after findings, no repeated finding details, no scope/persona noise unless needed, and no prose-heavy recap",
      "max_score": 6
    }
  ]
}

evals

SKILL.md

tile.json