Review PR comments, address code issues in source files (not generated files), regenerate derived artifacts, run lint/format, commit, push, and reply to the comment thread confirming resolution.
93
Quality
89%
Does it follow best practices?
Impact
99%
1.19xAverage score across 5 eval scenarios
{
"context": "Tests whether the agent implements the full end-to-end PR comment resolution workflow in the correct order, including all key stages: comment fetching via review API, filtering, critical assessment, user confirmation, source-only editing, regeneration, verification, commit formatting, pushing, and reply posting.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Review comments API",
"description": "resolver.py or config.py references the pulls/{number}/comments endpoint for fetching review comments (not issues comments)",
"max_score": 8
},
{
"name": "Comment filtering stage",
"description": "resolver.py includes a filtering step that excludes already-replied comments and/or old comments from prior runs",
"max_score": 8
},
{
"name": "Assessment stage",
"description": "resolver.py includes a distinct assessment/triage stage where each comment is evaluated with possible outcomes of address, defer, or disagree",
"max_score": 10
},
{
"name": "User confirmation stage",
"description": "resolver.py includes a step that presents a plan and pauses for user confirmation before making any code changes",
"max_score": 10
},
{
"name": "Source-only editing",
"description": "resolver.py, README.md, or workflow_diagram.md states that only source files are edited (not generated/derived files)",
"max_score": 10
},
{
"name": "Regeneration stage",
"description": "The workflow includes a regeneration step for derived artifacts after editing source files",
"max_score": 8
},
{
"name": "Verification stage",
"description": "The workflow includes running formatter/linter/tests after changes, before committing",
"max_score": 8
},
{
"name": "Correct workflow order",
"description": "workflow_diagram.md shows the stages in order: fetch -> filter -> assess -> confirm -> fix -> regenerate -> verify -> commit -> push -> reply",
"max_score": 10
},
{
"name": "Reply via thread endpoint",
"description": "resolver.py or config.py uses the pulls/comments/{id}/replies endpoint for posting replies (not issue comments as default)",
"max_score": 8
},
{
"name": "One commit per comment",
"description": "resolver.py or README.md mentions creating one commit per comment (unless issues share a root cause) rather than a single commit for all changes",
"max_score": 8
},
{
"name": "Commit format",
"description": "resolver.py or config.py defines commit messages using 'fix:' prefix with Co-Authored-By attribution",
"max_score": 8
},
{
"name": "Empty result handling",
"description": "resolver.py handles the case where no unaddressed comments remain — reports this and stops rather than proceeding with empty work",
"max_score": 4
}
]
}Install with Tessl CLI
npx tessl i sahildmk/pr-comment-resolver