Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
85
81%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly articulates specific capabilities (polling, diagnosing, retrying, auto-fixing), provides explicit trigger guidance via a 'Use when...' clause, and occupies a distinct niche of post-creation PR monitoring. It uses proper third-person voice throughout and balances detail with readability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: continuously polling review comments, CI checks/workflow runs, mergeability state, diagnosing failures, retrying flaky failures up to 3 times, auto-fixing/pushing branch-related issues, and surfacing review feedback. | 3 / 3 |
Completeness | Clearly answers both 'what' (babysit a PR by polling comments, CI checks, mergeability, diagnosing failures, retrying flaky tests, auto-fixing branch issues) and 'when' with an explicit 'Use when...' clause listing specific trigger scenarios. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'monitor a PR', 'watch CI', 'handle review comments', 'keep an eye on failures', 'feedback on an open PR', 'pull request', 'GitHub'. These cover common variations of how users would phrase such requests. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: post-creation PR monitoring/babysitting is a very specific workflow unlikely to conflict with general GitHub, CI, or code review skills. The focus on continuous polling, flaky retry logic, and mergeability watching clearly distinguishes it. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill excels at actionability and workflow clarity, providing precise commands, clear decision trees, and well-sequenced multi-step processes with validation checkpoints. However, it suffers significantly from verbosity and repetition—the same behavioral rules (especially around not stopping after pushes, restarting --watch, and review comment handling) are restated across nearly every section, roughly tripling the token cost without adding new information. The progressive disclosure is adequate but the monolithic structure with redundant sections undermines readability.
Suggestions
Consolidate repeated rules into a single authoritative section (e.g., state 'restart --watch after any push' once in Core Workflow or Git Safety Rules, not in 5+ places) to cut token usage by ~40%.
Extract the Monitoring Loop Pattern section entirely—it largely duplicates Core Workflow with minor additions that could be folded into the original numbered list.
Move the detailed Stop Conditions and Polling Cadence into a reference file or collapse them into a compact table/checklist rather than prose paragraphs.
Remove Output Expectations bullet points that merely restate stop conditions and workflow rules already covered elsewhere (e.g., 'A review-fix commit + push is not a completion event' appears in at least 3 other sections).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~250+ lines with significant repetition. The same instructions about restarting --watch after a push, not stopping after a push, continuing polling, and review comment handling are restated 4-6 times across different sections (Core Workflow, Review Comment Handling, Git Safety Rules, Monitoring Loop Pattern, Stop Conditions, Output Expectations). Many rules could be consolidated into a single authoritative statement rather than repeated in every section. | 1 / 3 |
Actionability | The skill provides fully executable commands (python3 scripts with specific flags, gh CLI commands with exact arguments and output formats), concrete commit message templates, specific JSON field names to inspect (actions, failed_jobs, job_id, logs_endpoint), and clear decision criteria for classifying CI failures. The guidance is copy-paste ready and leaves no ambiguity about what to run. | 3 / 3 |
Workflow Clarity | The monitoring loop is clearly sequenced with numbered steps, explicit validation checkpoints (check merged/closed first, then CI, then reviews, then mergeability), feedback loops (push → restart watcher → re-poll), priority ordering (review fixes before flaky reruns), and well-defined stop conditions. The CI failure classification includes a clear decision tree between branch-related and flaky failures with explicit guidance on when to patch vs rerun vs escalate. | 3 / 3 |
Progressive Disclosure | The skill references two external files (heuristics.md and github-api-notes.md) which is good progressive disclosure, but the main SKILL.md itself is a monolithic wall of text with heavily overlapping sections. Content that could be consolidated or split out (e.g., the detailed monitoring loop pattern largely duplicates the core workflow, and the extensive stop conditions / polling cadence could be a reference) remains inline, making the document harder to navigate. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f88701f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.