CtrlK
BlogDocsLog inGet started
Tessl Logo

auto-review-loop-llm

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".

61

Quality

51%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Critical

Do not install without reviewing

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/skills-codex/auto-review-loop-llm/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description names a high-level concept ('autonomous research review loop') and provides explicit trigger phrases, but fails to explain what concrete actions the skill performs or what outcomes it produces. The trigger phrases are prescribed commands rather than natural language a user would use, and the core capability remains abstract.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Iteratively reviews research papers/documents by sending content to an LLM for critique, collecting feedback, and refining analysis across multiple passes.'

Include natural trigger terms users would actually say, such as 'review my research', 'iterative feedback loop', 'automated paper review', 'LLM-assisted critique'.

Expand the 'Use when...' clause beyond prescribed command phrases to describe scenarios, e.g., 'Use when the user wants automated multi-pass review of research content using an external LLM API.'

DimensionReasoningScore

Specificity

The description says 'autonomous research review loop' but never explains what concrete actions are performed — what is being reviewed, what output is produced, or what steps the loop involves. 'Research review loop' is abstract and vague.

1 / 3

Completeness

It partially answers 'what' (autonomous research review loop using an LLM API) and provides explicit trigger phrases for 'when', but the 'what' is too vague to be meaningful — it doesn't explain what the review loop actually does. The trigger phrases serve as a 'when' clause but the lack of substantive 'what' weakens completeness.

2 / 3

Trigger Term Quality

It includes explicit trigger phrases ('auto review loop llm', 'llm review') and mentions 'OpenAI-compatible LLM API', but these are prescribed command phrases rather than natural keywords a user would organically say. A user is more likely to say 'review my research' or 'run a review loop' than the exact quoted triggers.

2 / 3

Distinctiveness Conflict Risk

The specific trigger phrases ('auto review loop llm', 'llm review') and mention of OpenAI-compatible LLM API provide some distinctiveness, but 'research review' is broad enough to potentially overlap with other review or research-related skills.

2 / 3

Total

7

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides an excellent, well-structured autonomous review loop workflow with clear phases, validation checkpoints, and recovery mechanisms. Its main weakness is significant verbosity: repeated prompt templates, an extensive provider table, and redundant curl examples inflate the token cost substantially. The content would benefit from extracting reference material (provider configs, prompt templates) into separate files.

Suggestions

Extract the provider table and MCP configuration examples into a separate CONFIGURATION.md reference file, keeping only a one-line pointer in the main skill.

Consolidate the prompt templates — the Phase A and Round 2+ templates are nearly identical and could be a single template with a note about including previous review context for rounds 2+.

Remove the duplicate curl fallback shown in both the 'API Call Method' section and Phase A — show it once and reference it.

DimensionReasoningScore

Conciseness

The skill is excessively verbose. The full provider table with 8 Chinese/international LLM providers, repeated curl examples (shown 3 times with minor variations), and the MCP configuration JSON are all padding. The prompt templates are shown twice (Phase A and Round 2+ section) with near-identical content. Claude doesn't need explanations of how curl works or what an OpenAI-compatible API is.

1 / 3

Actionability

The skill provides fully concrete, executable guidance: exact MCP tool call syntax, complete curl commands, specific JSON schemas for state persistence, exact markdown templates for documentation, and clear threshold values (score >= 6/10). Everything is copy-paste ready.

3 / 3

Workflow Clarity

The workflow is clearly sequenced through Phases A-E with explicit validation checkpoints (Phase B has a STOP condition, state persistence at end of Phase E), recovery via REVIEW_STATE.json, and clear termination criteria. The feedback loop (review → fix → re-review) is the core of the skill and is well-defined.

3 / 3

Progressive Disclosure

The skill references shared protocols at the end (output versioning, manifest, language) which is good progressive disclosure. However, the main body is monolithic — the provider table, configuration details, and prompt templates could be split into separate reference files. The inline content is heavy for a SKILL.md overview.

2 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.