CtrlK
BlogDocsLog inGet started
Tessl Logo

uinaf/review

Review existing code, diffs, branches, or pull requests using concern-specific reviewer personas and evidence. Use when auditing someone else's work, triaging risk in a PR, or producing a ship-it / needs-review / blocked verdict. Do not use to verify your own completed change; use `verify` for that.

98

1.20x
Quality

100%

Does it follow best practices?

Impact

96%

1.20x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent correctly selects reviewer personas for a refactor that touches types, error handling, dead code, and auth. The agent should apply the default personas plus relevant conditional ones, avoid spawning unnecessary personas, load the repo's AGENTS.md, and produce a well-structured output with verdict and evidence without a noisy footer.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Default personas used",
      "description": "The report mentions using at least two of the three default personas: general, tests, silent-failures",
      "max_score": 8
    },
    {
      "name": "Types persona included",
      "description": "The report mentions using the types reviewer persona (or equivalent type-safety lens) — appropriate given the new AuthToken/zod schema introduction",
      "max_score": 10
    },
    {
      "name": "Cleanup persona included",
      "description": "The report mentions using the cleanup reviewer persona (or equivalent dead code lens) — appropriate given middleware.old.ts is a dead file",
      "max_score": 8
    },
    {
      "name": "Comments persona omitted",
      "description": "The report does NOT mention using a comments reviewer persona — no substantial comment changes justify it",
      "max_score": 6
    },
    {
      "name": "Repo guidance loaded",
      "description": "The report references or incorporates at least one rule from AGENTS.md (e.g. zod requirement, logging prohibition, 401 JSON shape, integration test requirement)",
      "max_score": 10
    },
    {
      "name": "Personas listed in output",
      "description": "The review report shows persona choice only when useful for the task or in compact metadata; it does not force persona details into the verdict footer",
      "max_score": 6
    },
    {
      "name": "Verdict present",
      "description": "The review report contains exactly one of: 'ship it', 'needs review', or 'blocked' as a verdict",
      "max_score": 8
    },
    {
      "name": "Scope stated",
      "description": "The report names the reviewed scope only when needed to disambiguate the review; it does not add a noisy scope line after clear findings",
      "max_score": 6
    },
    {
      "name": "Findings ordered by severity",
      "description": "When multiple findings are present, they are listed with severity labels (e.g. high/medium/low) or are otherwise ranked by risk level",
      "max_score": 8
    },
    {
      "name": "Evidence for findings",
      "description": "At least one finding includes a specific file reference (with or without line number) rather than a vague description",
      "max_score": 10
    },
    {
      "name": "Silent failures finding",
      "description": "The report identifies the silent error swallowing in middleware.old.ts (catch blocks that call next() without error) as a concern — OR notes this file is dead/unused",
      "max_score": 8
    },
    {
      "name": "Unverified areas acknowledged",
      "description": "The report includes a section or statement about unverified areas, readiness gaps, or surfaces that could not be confirmed",
      "max_score": 6
    },
    {
      "name": "Recommended follow-up",
      "description": "The report includes a recommended follow-up action from the allowed options: implementation, verify, agent-readiness, or docs",
      "max_score": 6
    },
    {
      "name": "Compact verdict block",
      "description": "The report keeps verdict metadata compact: no more than 4 labeled lines after findings, no repeated finding details, no scope/persona noise unless needed, and no prose-heavy recap",
      "max_score": 6
    }
  ]
}

evals

SKILL.md

tile.json