Babysit a GitHub pull request after creation by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and stop only when user help is required (for example CI infrastructure issues, exhausted flaky retries, or ambiguous/blocking situations). Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
92
Quality
92%
Does it follow best practices?
Impact
85%
2.12xAverage score across 3 eval scenarios
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that thoroughly explains the specific capabilities (polling CI, diagnosing failures, retrying flaky tests, auto-fixing branch issues) and includes a comprehensive 'Use when...' clause with natural trigger terms. The description is appropriately detailed without being verbose, uses correct third-person voice, and carves out a distinct niche for PR babysitting that won't conflict with other GitHub-related skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'continuously polling CI checks/workflow runs', 'diagnose failures', 'retry likely flaky failures up to 3 times', 'auto-fix/push branch-related issues'. Very detailed about what the skill does. | 3 / 3 |
Completeness | Clearly answers both what (babysit PR, poll CI, diagnose failures, retry flaky tests, auto-fix issues) AND when with explicit 'Use when...' clause listing specific trigger scenarios like 'monitor a PR', 'watch CI', 'handle review comments'. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'monitor a PR', 'watch CI', 'handle review comments', 'keep an eye on failures', 'GitHub pull request', 'mergeability', 'workflow runs'. These match how users naturally describe PR monitoring tasks. | 3 / 3 |
Distinctiveness Conflict Risk | Very clear niche focused specifically on post-creation PR monitoring and CI babysitting. The combination of 'continuously polling', 'flaky retries', and 'mergeability state' creates a distinct profile unlikely to conflict with general GitHub or code review skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a high-quality skill with excellent actionability and workflow clarity. The multi-step monitoring process is well-documented with explicit stop conditions, validation checkpoints, and feedback loops. Minor verbosity exists with some repeated instructions across sections, but overall the skill effectively teaches a complex autonomous monitoring task.
Suggestions
Consolidate repeated instructions about restarting --watch after pushes into a single 'Post-Push Protocol' section to reduce redundancy
Consider moving the detailed 'Monitoring Loop Pattern' into the referenced heuristics.md since it largely restates the Core Workflow with additional detail
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is comprehensive but contains some redundancy, particularly in the monitoring loop pattern which repeats concepts from the core workflow. Some sections could be tightened (e.g., review comment handling repeats restart instructions multiple times). | 2 / 3 |
Actionability | Provides fully executable commands with clear syntax, specific commit message templates, concrete gh CLI commands for diagnosis, and explicit script paths. All guidance is copy-paste ready. | 3 / 3 |
Workflow Clarity | Excellent multi-step workflow with numbered sequences, explicit validation checkpoints (classify before retry, check mergeability on every loop), clear decision points, and feedback loops (push -> resume watching -> repeat). Stop conditions are explicitly enumerated. | 3 / 3 |
Progressive Disclosure | Well-structured with clear sections, appropriate inline content for the main workflow, and one-level-deep references to heuristics.md and github-api-notes.md for detailed supplementary information. Navigation is clear and signaled. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.