Autonomous multi-round research review loop. Repeatedly reviews using Claude Code via claude-review MCP, implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Override for Codex users who want Claude Code, not a second Codex agent, to act as the reviewer. Install this package after
skills/skills-codex/*.
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
AUTO_REVIEW.md in project root (cumulative log)claude-review — Claude reviewer invoked through the local claude-review MCP bridge. Set CLAUDE_REVIEW_MODEL if you need a specific Claude model override.true, pause after each round's review (Phase B) and present the score + weaknesses to the user. Wait for user input before proceeding to Phase C. The user can: approve the suggested fixes, provide custom modification instructions, skip specific fixes, or stop the loop early. When false (default), the loop runs fully autonomously.💡 Override:
/auto-review-loop "topic" — human checkpoint: true
Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to REVIEW_STATE.json after each round:
{
"round": 2,
"thread_id": "019cd392-...",
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": ["screen_name_1"],
"timestamp": "2026-03-13T21:00:00"
}Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.
On completion (positive assessment or max rounds), set "status": "completed" so future invocations don't accidentally resume a finished loop.
REVIEW_STATE.json in project root:
status is "completed": fresh start (previous loop finished normally)status is "in_progress" AND timestamp is older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over)status is "in_progress" AND timestamp is within 24 hours: resume
round, thread_id, last_score, pending_experimentsAUTO_REVIEW.md to restore full context of prior roundspending_experiments is non-empty, check if they have completed (e.g., check screen sessions)AUTO_REVIEW.md with header and timestampSend comprehensive context to the external reviewer:
mcp__claude-review__review_start:
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round, if any]
Please act as a senior ML reviewer (NeurIPS/ICML level).
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
If this is round 2+, use mcp__claude-review__review_reply_start with the saved completed threadId, then poll mcp__claude-review__review_status with the returned jobId until done=true to maintain continuity.
CRITICAL: Save the FULL raw response from the external reviewer verbatim (store in a variable for Phase E). Do NOT discard or summarize — the raw text is the primary record.
Then extract structured fields:
STOP CONDITION: If score >= 6 AND verdict contains "ready" or "almost" → stop loop, document final state.
Skip this step entirely if HUMAN_CHECKPOINT = false.
When HUMAN_CHECKPOINT = true, present the review results and wait for user input:
📋 Round N/MAX_ROUNDS review complete.
Score: X/10 — [verdict]
Top weaknesses:
1. [weakness 1]
2. [weakness 2]
3. [weakness 3]
Suggested fixes:
1. [fix 1]
2. [fix 2]
3. [fix 3]
Options:
- Reply "go" or "continue" → implement all suggested fixes
- Reply with custom instructions → implement your modifications instead
- Reply "skip 2" → skip fix #2, implement the rest
- Reply "stop" → end the loop, document current stateWait for the user's response. Parse their input:
After parsing the score, check if ~/.codex/feishu.json exists and mode is not "off":
review_scored notification: "Round N: X/10 — [verdict]" with top 3 weaknessesFor each action item (highest priority first):
Prioritization rules:
If experiments were launched:
Append to AUTO_REVIEW.md:
## Round N (timestamp)
### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
### Reviewer Raw Response
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response from the external reviewer here — verbatim, unedited.
This is the authoritative record. Do NOT truncate or paraphrase.]
</details>
### Actions Taken
- [what was implemented/changed]
### Results
- [experiment outcomes, if any]
### Status
- [continuing to round N+1 / stopping]Write REVIEW_STATE.json with current round, agent id, score, verdict, and any pending experiments.
Increment round counter → back to Phase A.
When loop ends (positive assessment or max rounds):
REVIEW_STATE.json with "status": "completed"AUTO_REVIEW.mdpipeline_done with final score progression tableLarge file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Always ask the Claude reviewer for strict, high-rigor feedback.
Save the completed threadId from the first mcp__claude-review__review_status result, then use mcp__claude-review__review_reply_start plus mcp__claude-review__review_status for subsequent rounds
Be honest — include negative results and failed experiments
Do NOT hide weaknesses to game a positive score
Implement fixes BEFORE re-reviewing (don't just promise to fix)
If an experiment takes > 30 minutes, launch it and continue with other fixes while waiting
Document EVERYTHING — the review log should be self-contained
Update project notes after each round, not just at the end
mcp__claude-review__review_reply_start:
threadId: [saved from round 1]
prompt: |
[Round N update]
Since your last review, we have:
1. [Action 1]: [result]
2. [Action 2]: [result]
3. [Action 3]: [result]
Updated results table:
[paste metrics]
Please re-score and re-assess. Are the remaining concerns addressed?
Same format: Score, Verdict, Remaining Weaknesses, Minimum Fixes.After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
dc00dfb
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.