Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
72
—
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Babysit a PR persistently until one of these terminal outcomes occurs:
Do not stop merely because a single snapshot returns idle while checks are still pending.
Accept any of the following:
--pr auto)--watch) unless you are intentionally doing a one-shot diagnostic snapshot.--watch).actions list in the JSON response.diagnose_ci_failure is present, inspect failed run logs and classify the failure.process_review_comment is present, inspect surfaced published review items and decide whether to address them.retry_failed_checks is present, rerun failed jobs with --retry-failed-now.retry_failed_checks are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.gh pr view) alongside CI.--watch before pausing to patch/commit/push, relaunch --watch yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).stop_pr_closed appears or a user-help-required blocker is reached. A green + review-clean + mergeable PR is a progress milestone, not a reason to stop the watcher while the PR is still open.--watch process running and then end the turn as if monitoring were complete.python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --oncepython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watchpython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-nowpython3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --onceUse gh commands to inspect failed runs before deciding to rerun.
gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headShagh api repos/<owner>/<repo>/actions/runs/<run-id>/jobs -X GET -f per_page=100gh api repos/<owner>/<repo>/actions/jobs/<job-id>/logs > /tmp/codex-gh-job-<job-id>-logs.zipgh run view <run-id> --log-failed as a fallback after the overall workflow run is completegh run view --log-failed is workflow-run scoped and may not expose failed-job logs until the overall run finishes. For faster diagnosis, poll the run's jobs first and, as soon as a specific job has failed, fetch that job's logs directly from the Actions job logs endpoint. The watcher includes a failed_jobs list with each failed job's job_id and logs_endpoint when GitHub exposes one.
Prefer treating failures as branch-related when failed-job logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
Do not attempt to fix flaky/unrelated failures by changing tests, build scripts, CI configuration, dependency pins, or infrastructure-adjacent code unless the logs clearly connect the failure to the PR branch. For flaky/unrelated failures, rerun only when the watcher recommends retry_failed_checks; otherwise wait or stop for user help.
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read .codex/skills/babysit-pr/references/heuristics.md for a concise checklist.
The watcher surfaces review items from:
Only act on published feedback. Ignore review submissions in GitHub's PENDING state and inline
comments attached to those pending reviews. Do not mark pending review feedback as seen; it should
be eligible to surface after the reviewer submits the review.
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from chatgpt-codex-connector[bot]) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
On a fresh watcher state file, existing unaddressed published review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
When you agree with a comment and it is actionable:
codex: address PR review feedback (#<n>).--watch mode, restart --watch immediately after the push in the same turn; do not wait for the user to ask again.Do not post replies to human-authored GitHub review comments/threads automatically. If you disagree with a human comment, believe it is non-actionable/already addressed, or need to answer a question, report the item to the user with a suggested response and wait for explicit confirmation before posting anything on GitHub. If the user approves a response, prefix it with [codex] so it is clear the response is automated and not from the human user.
If the watcher later surfaces your own approved reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
You can read any PR state you need for monitoring. Writes must comply with this policy.
You can push PRs to update the code under review or to force CI re-runs as described above.
You can resolve review comment threads from the human who requested babysitting or from the Codex
review bot. When resolving, leave a comment prefixed with [from Codex]: and explain what changes
you made and which commit includes them. Don't touch review threads if other humans other than the
user who requested babysitting have participated.
Before making any changes, fetch the PR state yourself instead of relying on the PR watcher script's output.
Unless explicitly asked, do not:
In general, never act on GitHub in ways that would make it hard to tell whether you or the user did something visible to other humans. When in doubt, ask the user for clarification in chat.
git push, then re-run the watcher.--watch session to make the fix, restart --watch immediately after the push in the same turn.--watch processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.Commit message defaults:
codex: fix CI failure on PR #<n>codex: address PR review feedback (#<n>)Use this loop in a live Codex session:
--once.actions.failed_jobs already includes a failed job, fetch that job's logs and diagnose immediately instead of waiting for the whole workflow run to finish. Patch only when the failure is branch-related.retry_failed_checks is present and you are not about to replace the current SHA with a review/CI fix commit. Do not make code changes for unrelated flakes or infrastructure failures just to get CI green.--watch) in the same turn unless a strict stop condition has already been reached.When the user explicitly asks to monitor/watch/babysit a PR, prefer --watch so polling continues autonomously in one command. Use repeated --once snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
If a --watch process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
Keep review polling aggressive and continue monitoring even after CI turns green:
Stop only when one of the following is true:
Keep polling when:
actions contains only idle but checks are still pending.REVIEW_REQUIRED / similar); continue polling at the base cadence and surface any new review comments without asking for confirmation to keep watching.Provide concise progress updates while monitoring and a final summary that includes:
During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
Treat push confirmations, intermediate CI snapshots, ready-to-merge snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
A review-fix commit + push is not a completion event; immediately resume live monitoring (--watch) in the same turn and continue reporting progress updates.
When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: 🚀 CI is all green! 33/33 passed. Still on watch for review approval.
Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
Final PR SHA
CI status summary
Mergeability / conflict status
Fixes pushed
Flaky retry cycles used
Remaining unresolved failures or review comments
.codex/skills/babysit-pr/references/heuristics.md.codex/skills/babysit-pr/references/github-api-notes.mde2398d0
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.