Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
61
51%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/skills-codex/auto-review-loop-llm/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description names a high-level concept ('autonomous research review loop') and provides explicit trigger phrases, but fails to explain what concrete actions the skill performs or what outcomes it produces. The trigger phrases are prescribed commands rather than natural language a user would use, and the core capability remains abstract.
Suggestions
Add specific concrete actions the skill performs, e.g., 'Iteratively reviews research papers/documents by sending content to an LLM for critique, collecting feedback, and refining analysis across multiple passes.'
Include natural trigger terms users would actually say, such as 'review my research', 'iterative feedback loop', 'automated paper review', 'LLM-assisted critique'.
Expand the 'Use when...' clause beyond prescribed command phrases to describe scenarios, e.g., 'Use when the user wants automated multi-pass review of research content using an external LLM API.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description says 'autonomous research review loop' but never explains what concrete actions are performed — what is being reviewed, what output is produced, or what steps the loop involves. 'Research review loop' is abstract and vague. | 1 / 3 |
Completeness | It partially answers 'what' (autonomous research review loop using an LLM API) and provides explicit trigger phrases for 'when', but the 'what' is too vague to be meaningful — it doesn't explain what the review loop actually does. The trigger phrases serve as a 'when' clause but the lack of substantive 'what' weakens completeness. | 2 / 3 |
Trigger Term Quality | It includes explicit trigger phrases ('auto review loop llm', 'llm review') and mentions 'OpenAI-compatible LLM API', but these are prescribed command phrases rather than natural keywords a user would organically say. A user is more likely to say 'review my research' or 'run a review loop' than the exact quoted triggers. | 2 / 3 |
Distinctiveness Conflict Risk | The specific trigger phrases ('auto review loop llm', 'llm review') and mention of OpenAI-compatible LLM API provide some distinctiveness, but 'research review' is broad enough to potentially overlap with other review or research-related skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides an excellent, well-structured autonomous review loop workflow with clear phases, validation checkpoints, and recovery mechanisms. Its main weakness is significant verbosity: repeated prompt templates, an extensive provider table, and redundant curl examples inflate the token cost substantially. The content would benefit from extracting reference material (provider configs, prompt templates) into separate files.
Suggestions
Extract the provider table and MCP configuration examples into a separate CONFIGURATION.md reference file, keeping only a one-line pointer in the main skill.
Consolidate the prompt templates — the Phase A and Round 2+ templates are nearly identical and could be a single template with a note about including previous review context for rounds 2+.
Remove the duplicate curl fallback shown in both the 'API Call Method' section and Phase A — show it once and reference it.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is excessively verbose. The full provider table with 8 Chinese/international LLM providers, repeated curl examples (shown 3 times with minor variations), and the MCP configuration JSON are all padding. The prompt templates are shown twice (Phase A and Round 2+ section) with near-identical content. Claude doesn't need explanations of how curl works or what an OpenAI-compatible API is. | 1 / 3 |
Actionability | The skill provides fully concrete, executable guidance: exact MCP tool call syntax, complete curl commands, specific JSON schemas for state persistence, exact markdown templates for documentation, and clear threshold values (score >= 6/10). Everything is copy-paste ready. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced through Phases A-E with explicit validation checkpoints (Phase B has a STOP condition, state persistence at end of Phase E), recovery via REVIEW_STATE.json, and clear termination criteria. The feedback loop (review → fix → re-review) is the core of the skill and is well-defined. | 3 / 3 |
Progressive Disclosure | The skill references shared protocols at the end (output versioning, manifest, language) which is good progressive disclosure. However, the main body is monolithic — the provider table, configuration details, and prompt templates could be split into separate reference files. The inline content is heavy for a SKILL.md overview. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
700fbe2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.