Autonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says "改论文", "improve paper", "论文润色循环", "auto improve", or wants to iteratively polish a generated paper.
63
77%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/skills-codex/auto-paper-improvement-loop/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates a specific autonomous paper-improvement workflow with well-defined steps and iteration count. It includes an explicit 'Use when' clause with both English and Chinese trigger terms, making it highly discoverable. The only minor concern is the reference to 'GPT-5.4 xhigh' which is somewhat opaque jargon, but overall the description is effective and well-structured.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds'. The workflow steps are clearly enumerated with a defined iteration count. | 3 / 3 |
Completeness | Clearly answers both 'what' (autonomously improve a paper via review → fix → recompile for 2 rounds) and 'when' (explicit 'Use when' clause with specific trigger phrases and a general condition about iterative polishing). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms in both English and Chinese: '改论文', 'improve paper', '论文润色循环', 'auto improve', and the natural phrase 'iteratively polish a generated paper'. Good multilingual coverage of terms users would actually say. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive: combines paper improvement with a specific autonomous loop workflow (GPT-5.4 xhigh review, recompilation, 2 rounds). The bilingual trigger terms and specific pipeline make it unlikely to conflict with generic writing or paper-generation skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is highly actionable with excellent workflow clarity — every step has concrete commands, validation checkpoints, and clear error recovery paths. However, it is severely bloated: the edit whitelist specification alone could be its own reference document, and numerous rationale/empirical motivation paragraphs explain things Claude doesn't need explained inline. The monolithic structure undermines usability despite the content being technically thorough.
Suggestions
Extract the entire 'Optional: Edit Whitelist' section (schema, resolution rules, glob semantics, forbidden-operation detectors, behavior descriptions) into a separate EDIT_WHITELIST_SPEC.md and reference it with a single line from the main skill.
Move empirical motivation paragraphs and rationale blocks into a separate DESIGN_NOTES.md — they justify design decisions but are not needed at execution time and consume significant tokens.
Remove redundant restatements of the same rule (e.g., edit-whitelist gate behavior is described identically in Steps 3 and 6, and the whitelist rejection logging rule appears in both the whitelist section and Key Rules).
Cut explanatory prose that Claude already knows (e.g., 'Use bash extglob / Python fnmatch.fnmatch semantics', the full rationale paragraph at the end of the whitelist section) to reduce token consumption by ~30-40%.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Massive sections on edit whitelist schema, glob semantics, forbidden-operation detectors, and resolution rules that could be in a separate reference file. Explains concepts Claude already knows (e.g., what YAML is, how fnmatch works). The rationale paragraphs, empirical motivation blocks, and repeated explanations of the same whitelist behavior across multiple steps add significant bloat. | 1 / 3 |
Actionability | Provides fully executable bash commands, concrete code snippets (Python regex normalization, latexmk invocations, pdfinfo checks), specific prompt templates for spawn_agent, detailed fix-pattern tables with exact before/after language, and precise regex detectors. The guidance is copy-paste ready throughout. | 3 / 3 |
Workflow Clarity | Clear numbered steps (0-9) with explicit validation checkpoints: recompile verification (0 undefined refs/citations), restatement regression tests after each compile, format checks with hard block criteria, and feedback loops (fix → validate → retry). The stop criteria are explicit, error recovery via state persistence is well-defined, and the human checkpoint protocol provides clear decision points. | 3 / 3 |
Progressive Disclosure | Monolithic wall of text with no bundle files to offload content to. The edit whitelist schema (~100 lines), forbidden-operation detector table, glob semantics, resolution rules, and kill-argument integration details should all be in separate reference files. The skill tries to be both an overview and a complete reference, resulting in poor organization. References to external skills (kill-argument, proof-checker, paper-claim-audit) exist but the inline content is overwhelming. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (575 lines); consider splitting into references/ and linking | Warning |
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
a425a71
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.