CtrlK
BlogDocsLog inGet started
Tessl Logo

sahildmk/pr-comment-resolver

Review PR comments, address code issues in source files (not generated files), regenerate derived artifacts, run lint/format, commit, push, and reply to the comment thread confirming resolution.

93

1.19x

Quality

89%

Does it follow best practices?

Impact

99%

1.19x

Average score across 5 eval scenarios

Overview
Skills
Evals
Files

rubric.jsonevals/scenario-2/

{
  "context": "Tests whether the agent critically assesses each review comment rather than blindly actioning them, correctly categorizes decisions as address/defer/disagree, identifies source files vs generated files, and presents a confirmation plan to the user before proceeding.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Three decision categories",
      "description": "assessment.json uses exactly the categories 'address', 'defer', and 'disagree' (or clear equivalents) for each comment's decision — not just 'fix' and 'skip'",
      "max_score": 10
    },
    {
      "name": "Generated file recognition",
      "description": "For comment 2003 (on generated/types.d.ts), the assessment identifies the source file (src/schemas/user.ts or similar) as the target, NOT the generated file itself",
      "max_score": 10
    },
    {
      "name": "OpenAPI generated file",
      "description": "For comment 2006 (on generated/openapi.json), the assessment targets the source handler or schema file, NOT generated/openapi.json",
      "max_score": 10
    },
    {
      "name": "Defer large refactor",
      "description": "Comment 2005 (migrate entire module to new pattern) is categorized as 'defer' rather than 'address', since it requires a broader refactor beyond the PR scope",
      "max_score": 10
    },
    {
      "name": "Rationale provided",
      "description": "Every entry in assessment.json includes a non-empty rationale field explaining the decision",
      "max_score": 10
    },
    {
      "name": "Defer/disagree reasoning",
      "description": "decisions_log.md provides a specific reason for each deferred or disagreed comment — not just 'won't fix' or 'out of scope' without explanation",
      "max_score": 10
    },
    {
      "name": "User confirmation plan",
      "description": "plan.md explicitly asks the user to confirm or adjust the plan before proceeding with changes",
      "max_score": 10
    },
    {
      "name": "Plan shows all comments",
      "description": "plan.md lists ALL 6 review comments with their proposed action, not just the ones being addressed",
      "max_score": 10
    },
    {
      "name": "Critical assessment",
      "description": "At least one comment is NOT categorized as 'address' — demonstrating critical evaluation rather than blindly fixing everything",
      "max_score": 10
    },
    {
      "name": "Source file targets only",
      "description": "No entry in assessment.json lists a file under 'generated/' as the target_file to edit",
      "max_score": 10
    }
  ]
}

Install with Tessl CLI

npx tessl i sahildmk/pr-comment-resolver@0.3.1

evals

tile.json