Babysit a GitHub pull request after creation by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and stop only when user help is required (for example CI infrastructure issues, exhausted flaky retries, or ambiguous/blocking situations). Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
92
Quality
92%
Does it follow best practices?
Impact
85%
2.12xAverage score across 3 eval scenarios
Babysit a PR persistently until one of these terminal outcomes occurs:
Do not stop merely because a single snapshot returns idle while checks are still pending.
Accept any of the following:
--pr auto)--watch) unless you are intentionally doing a one-shot diagnostic snapshot.--watch).actions list in the JSON response.diagnose_ci_failure is present, inspect failed run logs and classify the failure.process_review_comment is present, inspect surfaced review items and decide whether to address them.retry_failed_checks is present, rerun failed jobs with --retry-failed-now.retry_failed_checks are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.gh pr view) in addition to CI and review state.--watch before pausing to patch/commit/push, relaunch --watch yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).stop_pr_closed appears, or a user-help-required blocker is reached.--watch process running and then end the turn as if monitoring were complete.python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --oncepython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watchpython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-nowpython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --onceUse gh commands to inspect failed runs before deciding to rerun.
gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headShagh run view <run-id> --log-failedPrefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read .codex/skills/babysit-pr/references/heuristics.md for a concise checklist.
The watcher surfaces review items from:
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from chatgpt-codex-connector[bot]) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
When you agree with a comment and it is actionable:
codex: address PR review feedback (#<n>).--watch mode, restart --watch immediately after the push in the same turn; do not wait for the user to ask again.If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them). If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
git push, then re-run the watcher.--watch session to make the fix, restart --watch immediately after the push in the same turn.--watch processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.Commit message defaults:
codex: fix CI failure on PR #<n>codex: address PR review feedback (#<n>)Use this loop in a live Codex session:
--once.actions.retry_failed_checks is present and you are not about to replace the current SHA with a review/CI fix commit.--watch) in the same turn unless a strict stop condition has already been reached.When the user explicitly asks to monitor/watch/babysit a PR, prefer --watch so polling continues autonomously in one command. Use repeated --once snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
If a --watch process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
Use adaptive polling and continue monitoring even after CI turns green:
Stop only when one of the following is true:
Keep polling when:
actions contains only idle but checks are still pending.REVIEW_REQUIRED / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.Provide concise progress updates while monitoring and a final summary that includes:
During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
A review-fix commit + push is not a completion event; immediately resume live monitoring (--watch) in the same turn and continue reporting progress updates.
When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: 🚀 CI is all green! 33/33 passed. Still on watch for review approval.
Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
Final PR SHA
CI status summary
Mergeability / conflict status
Fixes pushed
Flaky retry cycles used
Remaining unresolved failures or review comments
.codex/skills/babysit-pr/references/heuristics.md.codex/skills/babysit-pr/references/github-api-notes.mdc1defcc
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.