Name: matthew-a-carr/babysit-pr
Rating: 90.8 (1 reviews)
Author: matthew-a-carr

matthew-a-carr/babysit-pr

Drive a PR to merge: address review comments (human + Copilot), push fixes, wait for CI to go green, then squash-merge. Use when a human says "babysit PR #NNN", "address the comments and merge when green", or "get this PR landed". Pushes and merges — invoking it IS the authorization to do so. Refuses to merge on red CI, unresolved blocking reviews, or conflicts; escalates instead.

1.17x

Quality

90%

Does it follow best practices?

Impact

94%

1.17x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, production-grade skill with excellent actionability and workflow clarity — it gives Claude precise tools, commands, decision criteria, and a well-bounded retry/escalation model. Its main weakness is moderate verbosity: some sections repeat constraints already embedded in the workflow, and the untrusted-content boilerplate is lengthy. The inline structure is acceptable given no bundle, but could benefit from slight reorganization.

Suggestions

Trim the 'Untrusted content' section to 2-3 sentences — Claude understands prompt injection defense; the current block is ~6 lines of elaboration on a single concept.

Consolidate the 'Do not' section by removing items already stated as constraints in the workflow steps (e.g., 'do not merge on red' is already enforced in Step 5's preconditions).

Dimension	Reasoning	Score
Conciseness	The skill is mostly efficient and project-specific, but includes some sections that could be tightened — the 'Untrusted content' block is lengthy boilerplate, and some explanations (e.g., the tool conventions rationale citing a specific GitHub issue) add tokens without proportional value. The 'Do not' section partially repeats constraints already stated in the workflow steps.	2 / 3
Actionability	Highly actionable: specifies exact MCP tool names, exact CLI commands (pnpm lint && pnpm db:check:migrations && ...), exact merge strategy (squash), exact commit format (Conventional Commits), and concrete decision criteria (3 fix attempts, all-green required checks). Every step tells Claude precisely what to do and with which tool.	3 / 3
Workflow Clarity	Excellent multi-step workflow with clear sequencing (Steps 1-6), explicit validation checkpoints (run local gate before push, poll CI, verify post-merge linkage), a bounded retry loop (3 fix attempts), and a well-defined escalation/feedback path (Step 6). The validate-fix-retry loop for CI failures is explicitly modeled.	3 / 3
Progressive Disclosure	The content is well-structured with clear headers and numbered steps, and references external documents (CONSTITUTION, AGENTS.md, ADRs, SPECs) appropriately. However, with no bundle files provided, all content is inline in a single file that runs fairly long (~100 lines of dense instruction). Some sections like the 'Do not' list and 'Untrusted content' block could potentially be factored out, but the skill is not egregiously monolithic for its complexity.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly communicates specific capabilities, includes natural trigger phrases, explicitly states both what and when, and occupies a distinct niche. The description also adds valuable context about authorization scope and refusal conditions, which helps Claude understand boundaries without being verbose.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: address review comments (human + Copilot), push fixes, wait for CI to go green, squash-merge. Also specifies refusal conditions (red CI, unresolved blocking reviews, conflicts) and escalation behavior.	3 / 3
Completeness	Clearly answers both what (drive a PR to merge by addressing comments, pushing fixes, waiting for CI, squash-merging) and when (explicit 'Use when' clause with three concrete trigger phrases). Also clarifies authorization scope and refusal conditions.	3 / 3
Trigger Term Quality	Includes highly natural trigger phrases users would actually say: 'babysit PR #NNN', 'address the comments and merge when green', 'get this PR landed'. Also includes terms like 'PR', 'merge', 'CI', 'review comments' that are natural developer vocabulary.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive niche: end-to-end PR babysitting with merge authority. The specific trigger phrases ('babysit PR', 'get this PR landed') and the explicit scope (pushing, merging, CI monitoring) clearly distinguish it from generic code review or git skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

15 days ago

Table of Contents

Discovery Implementation Validation