CtrlK
BlogDocsLog inGet started
Tessl Logo

matthew-a-carr/babysit-pr

Drive a PR to merge: address review comments (human + Copilot), push fixes, wait for CI to go green, then squash-merge. Use when a human says "babysit PR #NNN", "address the comments and merge when green", or "get this PR landed". Pushes and merges — invoking it IS the authorization to do so. Refuses to merge on red CI, unresolved blocking reviews, or conflicts; escalates instead.

90

1.17x
Quality

90%

Does it follow best practices?

Impact

94%

1.17x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

97%

43%

Prepare a Fix for a Reviewed Bug and Document the Verification Plan

TDD fix plan with full verification gate and Conventional Commit

Criteria
Without context
With context

Test written first

100%

100%

Test assertion described

100%

62%

Surgical change described

100%

100%

Full verification gate

33%

100%

OpenAPI check included

0%

100%

Conventional Commit format

100%

100%

No --no-verify

0%

100%

No test-skipping

0%

100%

No scope expansion

100%

100%

pnpm as package manager

0%

100%

100%

5%

Plan the PR Lifecycle Operations an Agent Runs

PR lifecycle script using mcp__github__* tool surface

Criteria
Without context
With context

mcp__github__ for PR read

100%

100%

mcp__github__ for comment reply

100%

100%

mcp__github__ for check status

100%

100%

mcp__github__ for merge

100%

100%

Squash merge strategy

100%

100%

Conventional Commit merge title

100%

100%

Branch deletion after merge

37%

100%

Merge guarded by green checks

100%

100%

No gh CLI for GitHub ops

100%

100%

87%

-4%

Decide Whether to Merge Three PRs and Document the Reasoning

Merge prerequisite assessment with squash strategy and Dependabot triage

Criteria
Without context
With context

PR 301 merge decision

100%

100%

PR 301 squash strategy

100%

100%

PR 301 Conventional Commit title

100%

100%

PR 301 branch deletion

100%

100%

PR 301 label linkage noted

50%

100%

PR 302 blocked on CI

100%

100%

PR 302 blocker described

100%

100%

PR 303 Dependabot flag

66%

13%

No --admin flag mentioned

100%

100%

All 3 PRs covered

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents