auto-paper-improvement-loop

Autonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says "改论文", "improve paper", "论文润色循环", "auto improve", or wants to iteratively polish a generated paper.

Quality

77%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skills-codex/auto-paper-improvement-loop/SKILL.md

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly communicates a specific autonomous paper-improvement workflow with well-defined steps and iteration count. It includes an explicit 'Use when' clause with both English and Chinese trigger terms, making it highly discoverable. The only minor concern is the reference to 'GPT-5.4 xhigh' which is somewhat opaque jargon, but overall the description is effective and well-structured.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds'. The workflow steps are clearly enumerated with a defined iteration count.	3 / 3
Completeness	Clearly answers both 'what' (autonomously improve a paper via review → fix → recompile for 2 rounds) and 'when' (explicit 'Use when' clause with specific trigger phrases and a general condition about iterative polishing).	3 / 3
Trigger Term Quality	Includes strong natural trigger terms in both English and Chinese: '改论文', 'improve paper', '论文润色循环', 'auto improve', and the natural phrase 'iteratively polish a generated paper'. Good multilingual coverage of terms users would actually say.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive: combines paper improvement with a specific autonomous loop workflow (GPT-5.4 xhigh review, recompilation, 2 rounds). The bilingual trigger terms and specific pipeline make it unlikely to conflict with generic writing or paper-generation skills.	3 / 3
	Total	12 / 12 Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is highly actionable with excellent workflow clarity — every step has concrete commands, validation checkpoints, and clear error recovery paths. However, it is severely bloated: the edit whitelist specification alone could be its own reference document, and numerous rationale/empirical motivation paragraphs explain things Claude doesn't need explained inline. The monolithic structure undermines usability despite the content being technically thorough.

Suggestions

Extract the entire 'Optional: Edit Whitelist' section (schema, resolution rules, glob semantics, forbidden-operation detectors, behavior descriptions) into a separate EDIT_WHITELIST_SPEC.md and reference it with a single line from the main skill.

Move empirical motivation paragraphs and rationale blocks into a separate DESIGN_NOTES.md — they justify design decisions but are not needed at execution time and consume significant tokens.

Remove redundant restatements of the same rule (e.g., edit-whitelist gate behavior is described identically in Steps 3 and 6, and the whitelist rejection logging rule appears in both the whitelist section and Key Rules).

Cut explanatory prose that Claude already knows (e.g., 'Use bash extglob / Python fnmatch.fnmatch semantics', the full rationale paragraph at the end of the whitelist section) to reduce token consumption by ~30-40%.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~500+ lines. Massive sections on edit whitelist schema, glob semantics, forbidden-operation detectors, and resolution rules that could be in a separate reference file. Explains concepts Claude already knows (e.g., what YAML is, how fnmatch works). The rationale paragraphs, empirical motivation blocks, and repeated explanations of the same whitelist behavior across multiple steps add significant bloat.	1 / 3
Actionability	Provides fully executable bash commands, concrete code snippets (Python regex normalization, latexmk invocations, pdfinfo checks), specific prompt templates for spawn_agent, detailed fix-pattern tables with exact before/after language, and precise regex detectors. The guidance is copy-paste ready throughout.	3 / 3
Workflow Clarity	Clear numbered steps (0-9) with explicit validation checkpoints: recompile verification (0 undefined refs/citations), restatement regression tests after each compile, format checks with hard block criteria, and feedback loops (fix → validate → retry). The stop criteria are explicit, error recovery via state persistence is well-defined, and the human checkpoint protocol provides clear decision points.	3 / 3
Progressive Disclosure	Monolithic wall of text with no bundle files to offload content to. The edit whitelist schema (~100 lines), forbidden-operation detector table, glob semantics, resolution rules, and kill-argument integration details should all be in separate reference files. The skill tries to be both an overview and a complete reference, resulting in poor organization. References to external skills (kill-argument, proof-checker, paper-claim-audit) exist but the inline content is overwhelming.	1 / 3
	Total	8 / 12 Passed

Validation

72%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 8 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (575 lines); consider splitting into references/ and linking	Warning
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	8 / 11 Passed

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: a425a71

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.