auto-review-loop-llm

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".

Quality

51%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skills-codex/auto-review-loop-llm/SKILL.md

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description fails to explain what the skill actually does in concrete terms — 'autonomous research review loop' is abstract and doesn't convey specific actions or outputs. While it includes explicit trigger phrases and configuration details, the lack of specificity about capabilities makes it difficult for Claude to know when this skill is truly appropriate versus other review or research skills.

Suggestions

Replace 'autonomous research review loop' with specific actions, e.g., 'Iteratively reviews research documents by generating critiques, identifying gaps, and refining analysis using an external LLM'.

Add a 'Use when...' clause describing natural scenarios, e.g., 'Use when the user wants automated multi-pass review of research papers, drafts, or technical documents using an external LLM API'.

Include more natural trigger terms users would actually say, such as 'review my paper', 'research critique', 'iterative feedback', 'automated review' rather than only prescriptive trigger phrases.

Dimension	Reasoning	Score
Specificity	The description says 'autonomous research review loop' but never explains what concrete actions are performed — what does 'research review loop' actually do? There are no specific actions like 'summarizes papers', 'checks citations', 'generates feedback', etc.	1 / 3
Completeness	It partially answers 'what' (autonomous research review loop) and provides explicit trigger phrases, but the 'what' is too vague to be meaningful. The 'when' is addressed with trigger phrases but they feel prescriptive rather than descriptive of natural use cases.	2 / 3
Trigger Term Quality	It includes some trigger terms like 'auto review loop llm' and 'llm review', and mentions 'OpenAI-compatible LLM API'. However, these are somewhat artificial trigger phrases rather than natural terms a user would say. A user might say 'review my research' or 'automated review' but likely not 'auto review loop llm'.	2 / 3
Distinctiveness Conflict Risk	The combination of 'autonomous review loop' + 'OpenAI-compatible LLM API' is somewhat distinctive, but 'research review' is broad enough to potentially overlap with other review or research-related skills. The specific trigger phrases help reduce conflict somewhat.	2 / 3
	Total	7 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill excels in actionability and workflow clarity, providing a well-structured autonomous review loop with precise stop conditions, state persistence, and clear phase sequencing. However, it is significantly bloated: the provider table, duplicated API call examples (shown in both the 'API Call Method' section and Phase A), and repeated prompt templates inflate the token cost substantially. Content that doesn't change between invocations (provider list, config examples) should be in referenced files.

Suggestions

Move the provider table and MCP configuration example to a separate CONFIGURATION.md reference file to reduce the main skill's token footprint by ~40%.

Eliminate the duplicated API call examples — show the MCP/curl pattern once in the API Call Method section and reference it from Phase A instead of repeating full code blocks.

Remove the Round 2+ prompt template section entirely or collapse it into a brief diff from the Round 1 template, since the only difference is the addition of 'Previous Review Summary' and 'Changes Since Last Review' sections.

Dimension	Reasoning	Score
Conciseness	The skill is extremely verbose at ~200+ lines. The provider table with 8 providers, repeated curl/MCP examples shown multiple times (Phase A duplicates the API Call Method section), and the full JSON config example are all padding. Claude doesn't need explanations of what OpenAI-compatible APIs are or how curl works.	1 / 3
Actionability	The skill provides fully concrete, executable guidance: exact MCP tool call syntax, complete curl commands, specific JSON schemas for state persistence, detailed prompt templates for each round, and precise stop conditions with numeric thresholds. Everything is copy-paste ready.	3 / 3
Workflow Clarity	The workflow is clearly sequenced (Phases A→E) with explicit validation checkpoints: the STOP condition in Phase B is precisely defined (score >= 6 AND verdict ∈ {ready, almost}), state persistence is mandated after every Phase E, recovery from prior state is handled in initialization, and termination conditions are explicit.	3 / 3
Progressive Disclosure	The skill references shared protocols via links (output-versioning.md, output-manifest.md, output-language.md) which is good, but the main body is monolithic — the provider table, repeated prompt templates, and dual MCP/curl examples could be split into reference files. No bundle files are provided to offload this content.	2 / 3
	Total	9 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Commit: de05de7

Reviewed: about 4 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.