CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

90

1.30x
Quality

91%

Does it follow best practices?

Impact

90%

1.30x

Average score across 18 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-12/

{
  "context": "Tests whether the agent, invoking the adopt-fork-pr skill on a same-repo (non-fork) PR, correctly passes through — recognizing the installed reviewer already covers the PR and reporting its status — instead of running the fork-adoption flow. The skill triggers on generic 'check PR N' phrasing, so the guarded behavior under test is that it does NOT create a parallel branch or duplicate PR for a PR that is already reviewable.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Identifies the PR as originating in the repository itself",
      "description": "Determines that PR #12's head branch lives in the repository (not a fork) — e.g. via `gh pr view --json isCrossRepository` or equivalent — and treats that as the deciding fact for how to handle it",
      "max_score": 25
    },
    {
      "name": "Recognizes the reviewer already covers it",
      "description": "States that the installed policy reviewer already runs on this PR, so no extra wiring or branch work is needed to get it reviewed",
      "max_score": 25
    },
    {
      "name": "Creates no branch and pushes nothing",
      "description": "Does NOT create a new branch, check out the PR onto a new branch, or push anything to the remote. No adoption mechanics are invoked",
      "max_score": 20
    },
    {
      "name": "Opens no duplicate PR",
      "description": "Does NOT open a second pull request mirroring #12. The existing PR is the one that proceeds",
      "max_score": 15
    },
    {
      "name": "Reports the PR's status",
      "description": "Reads and reports the PR's current review verdict and/or check status (e.g. `gh pr view --json reviewDecision,statusCheckRollup` or `gh pr checks`) as the actual response to the request",
      "max_score": 15
    }
  ]
}

README.md

tile.json