CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

91

1.24x
Quality

92%

Does it follow best practices?

Impact

91%

1.24x

Average score across 9 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-3/

{
  "context": "Tests whether the agent uses the correct gh CLI commands to poll CI status and retrieve both Copilot review state and inline comments from a pull request, matching the specific API patterns prescribed in the release skill.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "CI watch command",
      "description": "Uses `gh pr checks <N> --watch` (with the `--watch` flag) to poll CI until checks complete — not a manual sleep loop or alternative polling approach",
      "max_score": 20
    },
    {
      "name": "Review state API call",
      "description": "Retrieves the Copilot/automated review state via `gh api repos/<owner>/<repo>/pulls/<N>/reviews` with a `--jq '.[].state'` filter (or equivalent jq expression that extracts the state field)",
      "max_score": 20
    },
    {
      "name": "Inline comments API call",
      "description": "Retrieves inline PR comments via `gh api repos/<owner>/<repo>/pulls/<N>/comments` — uses the pull request review comments endpoint, not the issue comments endpoint",
      "max_score": 20
    },
    {
      "name": "PR number parameterized",
      "description": "The PR number is accepted as a script argument or variable rather than hardcoded, so the script is reusable across different PRs",
      "max_score": 10
    },
    {
      "name": "Review state surfaced",
      "description": "The script prints or summarizes the review state value(s) retrieved so the developer can see whether the review is approved, changes requested, etc.",
      "max_score": 15
    },
    {
      "name": "Inline comments surfaced",
      "description": "The script prints or summarizes the inline comments content (e.g. body text or count) so the developer can see what feedback was left",
      "max_score": 15
    }
  ]
}

evals

README.md

tile.json