babysit-pr

Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.

Quality

81%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

PR Babysitter

Objective

Babysit a PR persistently until one of these terminal outcomes occurs:

The PR is merged or closed.
A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
Optional handoff milestone: the PR is currently green + mergeable + review-clean. Treat this as a progress state, not a watcher stop, so late-arriving review comments are still surfaced promptly while the PR remains open.

Do not stop merely because a single snapshot returns idle while checks are still pending.

Inputs

Accept any of the following:

No PR argument: infer the PR from the current branch (--pr auto)
PR number
PR URL

Core Workflow

When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (--watch) unless you are intentionally doing a one-shot diagnostic snapshot.
Run the watcher script to snapshot PR/review/CI state (or consume each streamed snapshot from --watch).
Inspect the actions list in the JSON response.
If diagnose_ci_failure is present, inspect failed run logs and classify the failure.
If the failure is likely caused by the current branch, patch code locally, commit, and push. Do not patch random flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are unrelated to the branch.
If process_review_comment is present, inspect surfaced review items and decide whether to address them.
If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub.
Do not post replies to human-authored review comments/threads unless the user explicitly confirms the exact response. If a human review item is non-actionable, already addressed, or not valid, surface the item and recommended response to the user instead of replying on GitHub.
If the failure is likely flaky/unrelated and retry_failed_checks is present, rerun failed jobs with --retry-failed-now.
If both actionable review feedback and retry_failed_checks are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via gh pr view) alongside CI.
After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
If you had been using --watch before pausing to patch/commit/push, relaunch --watch yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
Repeat polling until stop_pr_closed appears or a user-help-required blocker is reached. A green + review-clean + mergeable PR is a progress milestone, not a reason to stop the watcher while the PR is still open.
Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached --watch process running and then end the turn as if monitoring were complete.

Commands

One-shot snapshot

python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once

Continuous watch (JSONL)

python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch

Trigger flaky retry cycle (only when watcher indicates)

python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now

Explicit PR target

python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --once

CI Failure Classification

Use gh commands to inspect failed runs before deciding to rerun.

gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha
gh api repos/<owner>/<repo>/actions/runs/<run-id>/jobs -X GET -f per_page=100
gh api repos/<owner>/<repo>/actions/jobs/<job-id>/logs > /tmp/codex-gh-job-<job-id>-logs.zip
gh run view <run-id> --log-failed as a fallback after the overall workflow run is complete

gh run view --log-failed is workflow-run scoped and may not expose failed-job logs until the overall run finishes. For faster diagnosis, poll the run's jobs first and, as soon as a specific job has failed, fetch that job's logs directly from the Actions job logs endpoint. The watcher includes a failed_jobs list with each failed job's job_id and logs_endpoint when GitHub exposes one.

Prefer treating failures as branch-related when failed-job logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).

Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).

Do not attempt to fix flaky/unrelated failures by changing tests, build scripts, CI configuration, dependency pins, or infrastructure-adjacent code unless the logs clearly connect the failure to the PR branch. For flaky/unrelated failures, rerun only when the watcher recommends retry_failed_checks; otherwise wait or stop for user help.

If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.

Read .codex/skills/babysit-pr/references/heuristics.md for a concise checklist.

Review Comment Handling

The watcher surfaces review items from:

PR issue comments
Inline review comments
Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)

It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from chatgpt-codex-connector[bot]) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored. For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex. On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.

When you agree with a comment and it is actionable:

Patch code locally.
Commit with codex: address PR review feedback (#<n>).
Push to the PR head branch.
After the push succeeds, mark the associated GitHub review thread/comment as resolved.
Resume watching on the new SHA immediately (do not stop after reporting the push).
If monitoring was running in --watch mode, restart --watch immediately after the push in the same turn; do not wait for the user to ask again.

Do not post replies to human-authored GitHub review comments/threads automatically. If you disagree with a human comment, believe it is non-actionable/already addressed, or need to answer a question, report the item to the user with a suggested response and wait for explicit confirmation before posting anything on GitHub. If the user approves a response, prefix it with [codex] so it is clear the response is automated and not from the human user. If the watcher later surfaces your own approved reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again. If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.

Git Safety Rules

Work only on the PR head branch.
Avoid destructive git commands.
Do not switch branches unless necessary to recover context.
Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
After each successful fix, commit and git push, then re-run the watcher.
If you interrupted a live --watch session to make the fix, restart --watch immediately after the push in the same turn.
Do not run multiple concurrent --watch processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.

Commit message defaults:

codex: fix CI failure on PR #<n>
codex: address PR review feedback (#<n>)

Monitoring Loop Pattern

Use this loop in a live Codex session:

Run --once.
Read actions.
First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
Check CI summary, new review items, and mergeability/conflict status.
Diagnose CI failures and classify branch-related vs flaky/unrelated. If the overall run is still pending but failed_jobs already includes a failed job, fetch that job's logs and diagnose immediately instead of waiting for the whole workflow run to finish. Patch only when the failure is branch-related.
For each surfaced review item from another author, patch/commit/push and then resolve it if it is actionable. If it is non-actionable, already addressed, or requires a written answer, surface it to the user with a suggested response instead of posting automatically. If a later snapshot surfaces your own approved reply, treat it as informational and continue without responding again.
Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
Retry failed checks only when retry_failed_checks is present and you are not about to replace the current SHA with a review/CI fix commit. Do not make code changes for unrelated flakes or infrastructure failures just to get CI green.
If you pushed a commit, resolved a review thread, or triggered a rerun, report the action briefly and continue polling (do not stop). If a human review comment needs a written GitHub response, stop and ask for confirmation before posting.
After a review-fix push, proactively restart continuous monitoring (--watch) in the same turn unless a strict stop condition has already been reached.
If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report that the PR is currently ready to merge but keep the watcher running so new review comments are surfaced quickly while the PR remains open.
If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
Otherwise sleep according to the polling cadence below and repeat.

When the user explicitly asks to monitor/watch/babysit a PR, prefer --watch so polling continues autonomously in one command. Use repeated --once snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check. Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts. Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task. If a --watch process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.

Polling Cadence

Keep review polling aggressive and continue monitoring even after CI turns green:

While CI is not green (pending/running/queued or failing): poll every 1 minute.
After CI turns green: keep polling at the base cadence while the PR remains open so newly posted review comments are surfaced promptly instead of waiting on a long green-state backoff.
Reset the cadence immediately whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
If CI stops being green again (new commit, rerun, or regression): stay on the base polling cadence.
If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.

Stop Conditions (Strict)

Stop only when one of the following is true:

PR merged or closed (stop as soon as a poll/snapshot confirms this).
User intervention is required and Codex cannot safely proceed alone.

Keep polling when:

actions contains only idle but checks are still pending.
CI is still running/queued.
Review state is quiet but CI is not terminal.
CI is green but mergeability is unknown/pending.
CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes.
The PR is green but blocked on review approval (REVIEW_REQUIRED / similar); continue polling at the base cadence and surface any new review comments without asking for confirmation to keep watching.

Output Expectations

Provide concise progress updates while monitoring and a final summary that includes:

During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
Treat push confirmations, intermediate CI snapshots, ready-to-merge snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
A review-fix commit + push is not a completion event; immediately resume live monitoring (--watch) in the same turn and continue reporting progress updates.
When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: 🚀 CI is all green! 33/33 passed. Still on watch for review approval.
Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
Final PR SHA
CI status summary
Mergeability / conflict status
Fixes pushed
Flaky retry cycles used
Remaining unresolved failures or review comments

References

Heuristics and decision tree: .codex/skills/babysit-pr/references/heuristics.md
GitHub CLI/API details used by the watcher: .codex/skills/babysit-pr/references/github-api-notes.md

Repository: openai/codex
Commit: f88701f

Last updated: 4 days ago
Created: 4 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.