Autonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says "改论文", "improve paper", "论文润色循环", "auto improve", or wants to iteratively polish a generated paper.
83
81%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates a specific multi-step workflow (review → fix → recompile for 2 rounds), includes explicit trigger terms in both English and Chinese, and has a well-defined 'Use when' clause. The description is concise yet comprehensive, and the specificity of the pipeline and tooling makes it highly distinctive from other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds.' This describes a clear pipeline with specific steps and iteration count. | 3 / 3 |
Completeness | Clearly answers both 'what' (autonomously improve a paper via review → fix → recompile for 2 rounds) and 'when' (explicit 'Use when' clause with specific trigger phrases and a conceptual trigger). | 3 / 3 |
Trigger Term Quality | Includes natural keywords in both English and Chinese that users would actually say: '改论文', 'improve paper', '论文润色循环', 'auto improve', and the conceptual trigger 'iteratively polish a generated paper'. Good multilingual coverage. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive niche: autonomous iterative paper improvement using a specific review tool (GPT-5.4 xhigh) with a defined pipeline. The Chinese trigger terms and specific workflow make it unlikely to conflict with generic writing or paper generation skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a highly actionable and well-structured workflow with excellent validation checkpoints, concrete commands, and thorough error recovery mechanisms. However, it is severely over-long — the nearly duplicated review prompts, extensive empirical motivation paragraphs, and inline reference tables bloat the token cost significantly. The content would benefit greatly from extracting repeated/reference material into separate files while keeping the main skill as a lean orchestration guide.
Suggestions
Extract the review prompt template into a shared reference file (e.g., `review-prompt-template.md`) and reference it from both Round 1 and Round 2 steps, since they are nearly identical.
Move the empirical motivation paragraphs ('in our April 2026 NeurIPS run...') into a separate `DESIGN_DECISIONS.md` or append them as footnotes — they explain rationale but are not needed for execution.
Move the fix pattern tables and format check auto-fix patterns into a separate reference file (e.g., `fix-patterns.md`) to reduce the main skill's token footprint.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~400+ lines. It includes extensive empirical motivation paragraphs ('in our April 2026 NeurIPS run...'), detailed explanations of why bias guard matters, lengthy inline tables, and repeated review prompt templates that are nearly identical between Round 1 and Round 2. Much of this could be compressed or moved to reference files. | 1 / 3 |
Actionability | The skill provides fully executable bash commands, complete spawn_agent prompt templates, specific fix pattern tables, concrete JSON schemas for state persistence, and copy-paste ready compilation commands. Every step has concrete, actionable guidance. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced (Steps 0-9) with explicit validation checkpoints: recompile verification (0 undefined references/citations), restatement regression tests after each recompile, format compliance checks with hard block criteria, and human checkpoints. Error recovery via state persistence is well-defined with feedback loops (fix → validate → retry). | 3 / 3 |
Progressive Disclosure | While the skill references external files like `../shared-references/review-tracing.md` and other skills (`/proof-checker`, `/paper-claim-audit`), the main body is a monolithic wall of text with everything inline. The nearly identical Round 1 and Round 2 review prompts, the extensive fix pattern tables, and the format check details could all be split into separate reference files. The collapsible details blocks in the log template show awareness of disclosure but aren't applied to the skill itself. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
700fbe2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.