Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
57
47%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Critical
Do not install without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/skills-codex/auto-review-loop-llm/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a specific tool integration (OpenAI-compatible LLM API, llm-chat MCP server) and provides explicit trigger phrases, which is helpful. However, it fails to explain what the skill actually does in concrete terms — what is being reviewed, what inputs it takes, what outputs it produces, and what the 'loop' entails. The lack of specificity about capabilities significantly weakens its utility for skill selection.
Suggestions
Add concrete actions describing what the review loop does, e.g., 'Iteratively reviews research papers/code by sending content to an external LLM, collecting feedback, and refining outputs until quality criteria are met.'
Expand the 'when' clause with natural use cases, e.g., 'Use when the user wants automated multi-pass review of documents, research papers, or code using an external LLM for feedback.'
Clarify what 'research review' means in practice — does it review academic papers, code, data analysis, or something else? This would reduce ambiguity and improve distinctiveness.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description says 'autonomous research review loop' but never explains what concrete actions are performed — what is being reviewed, what outputs are produced, or what steps the loop involves. 'Research review loop' is abstract and vague. | 1 / 3 |
Completeness | It partially answers 'what' (autonomous research review loop using an LLM API) and provides explicit trigger phrases, but the 'what' is too vague to be meaningful — it doesn't explain what the loop actually does. The 'when' is addressed via trigger phrases but lacks context about use cases. | 2 / 3 |
Trigger Term Quality | It includes some trigger phrases like 'auto review loop llm' and 'llm review', and mentions 'OpenAI-compatible LLM API', but these are somewhat artificial trigger phrases rather than natural terms a user would say. A user might say 'review my research' or 'automated review' but the exact phrases given feel prescribed rather than natural. | 2 / 3 |
Distinctiveness Conflict Risk | The mention of 'llm-chat MCP server' and 'OpenAI-compatible LLM API' provides some distinctiveness, and the specific trigger phrases help. However, 'research review' is broad enough to potentially overlap with other review or research-related skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill excels at actionability and workflow clarity with concrete, executable examples and a well-defined multi-phase loop with validation checkpoints. However, it suffers significantly from verbosity — the provider table, repeated prompt templates (shown 3 times), and inline configuration examples bloat the content far beyond what's needed. The lack of progressive disclosure means all reference material is crammed into one file rather than being appropriately split.
Suggestions
Move the supported providers table and MCP configuration examples to a separate PROVIDERS.md or CONFIG.md reference file, linking to it from the main skill.
Consolidate the three nearly-identical review prompt templates into a single parameterized template, noting that Round 2+ should include previous review summary and changes.
Move the curl fallback method to a separate FALLBACK.md file since MCP is the primary method, keeping only a brief mention and link in the main skill.
Remove the provider-specific details (8 providers with URLs and models) — Claude can look these up or the user can configure them; only show one example provider in the config block.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is excessively verbose. The supported providers table with 8 Chinese/international LLM providers is largely unnecessary padding. The curl fallback examples, MCP configuration JSON, and repeated review prompt templates (shown 3 times with minor variations) significantly bloat the content. Much of this (provider URLs, JSON config format) is reference material Claude doesn't need inline. | 1 / 3 |
Actionability | The skill provides fully concrete, executable guidance: exact MCP tool call syntax, complete curl commands, specific JSON schemas for state persistence, exact prompt templates, and clear threshold values (score >= 6/10). Everything is copy-paste ready. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced through Phases A-E with explicit validation checkpoints (Phase B has a STOP condition, state persistence at end of Phase E, recovery check at initialization). The feedback loop of review → implement → re-review is well-defined with clear termination conditions. | 3 / 3 |
Progressive Disclosure | This is a monolithic wall of text with no references to external files for detailed content. The provider table, prompt templates, curl examples, and MCP configuration could all be split into separate reference files. Everything is inlined in one large document with no navigation structure beyond section headers. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
dc00dfb
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.