CtrlK
BlogDocsLog inGet started
Tessl Logo

babysit-pr

Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.

72

Quality

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A highly actionable, well-sequenced PR-monitoring skill body with explicit validation and stop conditions and verified one-level-deep references. Its main weakness is verbosity: key directives are repeated across overlapping sections and inline content could be pushed into the reference files.

Suggestions

Consolidate the repeated 'restart --watch in the same turn after a push' directive into one canonical location and reference it, rather than restating it across the Core Workflow, Review Comment Handling, Git Safety Rules, Monitoring Loop Pattern, and Output Expectations sections.

Deduplicate the overlapping Core Workflow (15 steps) and Monitoring Loop Pattern (13 steps) into a single sequenced procedure, or clearly differentiate their purposes so the reader does not encounter the same guidance twice.

Format the References section as markdown links (e.g. '[heuristics.md](references/heuristics.md)') and consider moving the GitHub State Mutation Policy and Git Safety Rules detail into a reference file to keep the body as an overview.

DimensionReasoningScore

Conciseness

The content is operational rather than explanatory (no concepts Claude already knows are re-taught), but it is repetitive: the 'restart --watch immediately after the push in the same turn' directive appears roughly six times and the Core Workflow, Monitoring Loop Pattern, Review Comment Handling, and Git Safety Rules sections overlap heavily, so it is not as tight as the level-3 anchor.

2 / 3

Actionability

Provides copy-paste-ready commands (e.g. 'python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch', 'gh api repos/<owner>/<repo>/actions/jobs/<job-id>/logs'), concrete commit-message formats, and exact polling cadences — fully executable guidance.

3 / 3

Workflow Clarity

Presents a clear sequenced Core Workflow and Monitoring Loop Pattern with explicit stop conditions and validation checkpoints ('Before editing, check for unrelated uncommitted changes', 'fetch the PR state yourself instead of relying on the PR watcher script's output'), and feedback loops for the destructive/batch operations, so it is not capped at 2.

3 / 3

Progressive Disclosure

References are one level deep and all referenced files (heuristics.md, github-api-notes.md, gh_pr_watch.py) exist, but the SKILL.md body is itself long (~220 lines) with substantial inline policy that could be split out, and references are signaled via raw paths rather than markdown links, so it sits between the 2 and 3 anchors.

2 / 3

Total

10

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, third-person description that names concrete capabilities, provides natural trigger terms, and explicitly states when to use it. It is distinguishable from general PR/CI skills and avoids vague fluff.

DimensionReasoningScore

Specificity

Lists multiple concrete actions ('continuously polling review comments, CI checks/workflow runs, and mergeability state', 'Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues'), matching the multi-action anchor rather than the domain-only level 2.

3 / 3

Completeness

Explicitly answers both what it does and when to use it via a 'Use when the user asks Codex to monitor a PR, watch CI...' clause, satisfying the explicit-trigger anchor for a 3.

3 / 3

Trigger Term Quality

Covers natural phrases users would actually say ('monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR'), exceeding the partial-coverage level 2.

3 / 3

Distinctiveness Conflict Risk

Occupies a distinct niche (persistent post-creation PR babysitting with retry/auto-fix) with clear triggers unlikely to fire for unrelated skills.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository
openai/codex
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.